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PREFACE 


Summary of Changes in 
This Edition 


This textbook is an expanded version of Elementary Linear Algebra, eleventh edition, by 
Howard Anton. The first nine chapters of this book are identical to the first nine chapters 
of that text; the tenth chapter consists of twenty applications of linear algebra drawn 
from business, economics, engineering, physics, computer science, approximation theory, 
ecology, demography, and genetics. The applications are largely independent of each 
other, and each includes a list of mathematical prerequisites. Thus, each instructor has 
the flexibility to choose those applications that are suitable for his or her students and to 
incorporate each application anywhere in the course after the mathematical prerequisites 
have been satisfied. Chapters 1-9 include simpler treatments of some of the applications 
covered in more depth in Chapter 10. 

This edition gives an introductory treatment of linear algebra that is suitable for a 
first undergraduate course. Its aim is to present the fundamentals of linear algebra in the 
clearest possible way — sound pedagogy is the main consideration. Although calculus 
is not a prerequisite, there is some optional material that is clearly marked for students 
with a calculus background. If desired, that material can be omitted without loss of 
continuity. 

Technology is not required to use this text, but for instructors who would like to 
use MATLAB, Mathematica, Maple, or calculators with linear algebra capabilities, we 
have posted some supporting material that can be accessed at either of the following 
companion websites: 

www.howardanton.com 

www.wiley.com/college/anton 

Many parts of the text have been revised based on an extensive set of reviews. Here are 
the primary changes: 

Earlier Linear Transformations Linear transformations are introduced earlier (starting 
in Section 1.8). Many exercise sets, as well as parts of Chapters 4 and 8, have been 
revised in keeping with the earlier introduction of linear transformations. 

New Exercises Hundreds of new exercises of all types have been added throughout 
the text. 

Technology Exercises requiring technology such as MATLAB, Mathematica, or Maple 
have been added and supporting data sets have been posted on the companion websites 
for this text. The use of technology is not essential, and these exercises can be omitted 
without affecting the flow of the text. 

Exercise Sets Reorganized Many multiple-part exercises have been subdivided to create 
a better balance between odd and even exercise types. To simplify the instructor’s task 
of creating assignments, exercise sets have been arranged in clearly defined categories. 
Reorganization In addition to the earlier introduction of linear transformations, the 
old Section 4.12 on Dynamical Systems and Markov Chains has been moved to Chap- 
ter 5 in order to incorporate material on eigenvalues and eigenvectors. 

Rewriting Section 9.3 on Internet Search Engines from the previous edition has been 
rewritten to reflect more accurately how the Google PageRank algorithm works in 
practice. That section is now Section 10.20 of the applications version of this text. 
Appendix A Rewritten The appendix on reading and writing proofs has been expanded 
and revised to better support courses that focus on proving theorems. 

Web Materials Supplementary web materials now include various applications mod- 
ules, three modules on linear programming, and an alternative presentation of deter- 
minants based on permutations. 

Applications Chapter Section 10.2 of the previous edition has been moved to the 
websites that accompany this text, so it is now part of a three-module set on Linear 


VI 
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Hallmark Features 


About the Exercises 


Supplementary Materials 
for Students 


Programming. A new section on Internet search engines has been added that explains 
the PageRank algorithm used by Google. 

Relationships Among Concepts One of our main pedagogical goals is to convey to the 
student that linear algebra is a cohesive subject and not simply a collection of isolated 
definitions and techniques. One way in which we do this is by using a crescendo of 
Equivalent Statements theorems that continually revisit relationships among systems 
of equations, matrices, determinants, vectors, linear transformations, and eigenvalues. 
To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 
4.8.8, and then Theorem 5.1.5, for example. 

Smooth Transition to Abstraction Because the transition from R n to general vector 
spaces is difficult for many students, considerable effort is devoted to explaining the 
purpose of abstraction and helping the student to “visualize” abstract ideas by drawing 
analogies to familiar geometric ideas. 

Mathematical Precision When reasonable, we try to be mathematically precise. In 
keeping with the level of student audience, proofs are presented in a patient style that 
is tailored for beginners. 

Suitability for a Diverse Audience This text is designed to serve the needs of students 
in engineering, computer science, biology, physics, business, and economics as well as 
those majoring in mathematics. 

Historical Notes To give the students a sense of mathematical history and to convey 
that real people created the mathematical theorems and equations they are studying, we 
have included numerous Historical Notes that put the topic being studied in historical 
perspective. 

Graded Exercise Sets Each exercise set in the first nine chapters begins with routine 
drill problems and progresses to problems with more substance. These are followed 
by three categories of exercises, the first focusing on proofs, the second on true/false 
exercises, and the third on problems requiring technology. This compartmentalization 
is designed to simplify the instructor’s task of selecting exercises for homework. 

Proof Exercises Linear algebra courses vary widely in their emphasis on proofs, so 
exercises involving proofs have been grouped and compartmentalized for easy identifi- 
cation. Appendix A has been rewritten to provide students more guidance on proving 
theorems. 

True/False Exercises The True/False exercises are designed to check conceptual un- 
derstanding and logical reasoning. To avoid pure guesswork, the students are required 
to justify their responses in some way. 

Technology Exercises Exercises that require technology have also been grouped. To 
avoid burdening the student with keyboarding, the relevant data files have been posted 
on the websites that accompany this text. 

Supplementary Exercises Each of the first nine chapters ends with a set of supplemen- 
tary exercises that draw on all topics in the chapter. These tend to be more challenging. 

Student Solutions Manual This supplement provides detailed solutions to most odd- 
numbered exercises (ISBN 978-1-1 18-464427). 

Data Files Data files for the technology exercises are posted on the companion websites 
that accompany this text. 

MATLAB Manual and Linear Algebra Labs This supplement contains a set of MATLAB 
laboratory projects written by Dan Seth of West Texas A&M University. It is designed 
to help students learn key linear algebra concepts by using MATLAB and is available in 
PDF form without charge to students at schools adopting the 11th edition of the text. 
Videos A complete set of Daniel Solow’s How to Read and Do Proofs videos is available 
to students through WileyPLUS as well as the companion websites that accompany 
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Supplementary Materials 
for Instructors 


A Guide for the Instructor 


Reviewers 


this text. Those materials include a guide to help students locate the lecture videos 
appropriate for specific proofs in the text. 

Instructor’s Solutions Manual This supplement provides worked-out solutions to most 
exercises in the text (ISBN 978-1-118-434482). 

PowerPoint Presentations PowerPoint slides are provided that display important def- 
initions, examples, graphics, and theorems in the book. These can also be distributed 
to students as review materials or to simplify note taking. 

Test Bank Test questions and sample exams are available in PDF or DTpX form. 
WileyPLUS An online environment for effective teaching and learning. WileyPLUS 
builds student confidence by taking the guesswork out of studying and by providing a 
clear roadmap of what to do, how to do it, and whether it was done right. Its purpose is 
to motivate and foster initiative so instructors can have a greater impact on classroom 
achievement and beyond. 

Although linear algebra courses vary widely in content and philosophy, most courses 
fall into two categories — those with about 40 lectures and those with about 30 lectures. 
Accordingly, we have created long and short templates as possible starting points for 
constructing a course outline. Of course, these are just guides, and you will certainly 
want to customize them to fit your local interests and requirements. Neither of these 
sample templates includes applications or the numerical methods in Chapter 9. Those 
can be added, if desired, and as time permits. 



Long Template 

Short Template 

Chapter 1: Systems of Linear Equations and Matrices 

8 lectures 

6 lectures 

Chapter 2: Determinants 

3 lectures 

2 lectures 

Chapter 3: Euclidean Vector Spaces 

4 lectures 

3 lectures 

Chapter 4: General Vector Spaces 

10 lectures 

9 lectures 

Chapter 5: Eigenvalues and Eigenvectors 

3 lectures 

3 lectures 

Chapter 6: Inner Product Spaces 

3 lectures 

1 lecture 

Chapter 7: Diagonalization and Quadratic Forms 

4 lectures 

3 lectures 

Chapter 8: General Linear Transformations 

4 lectures 

3 lectures 

Total: 

39 lectures 

30 lectures 
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CHAPTER CONTENTS 
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Leontief Input-Output Models 96 


INTRODUCTION Information in science, business, and mathematics is often organized into rows and 
columns to form rectangular arrays called '‘matrices” (plural of “matrix”). Matrices 
often appear as tables of numerical data that arise from physical observations, but they 
occur in various mathematical contexts as well. For example, we will see in this chapter 
that all of the information required to solve a system of equations such as 

5x + y = 3 
2x — y = 4 

is embodied in the matrix 

'5 1 3" 

2-1 4 

and that the solution of the system can be obtained by performing appropriate 
operations on this matrix. This is particularly important in developing computer 
programs for solving systems of equations because computers are well suited for 
manipulating arrays of numerical information. However, matrices are not simply a 
notational tool for solving systems of equations; they can be viewed as mathematical 
objects in their own right, and there is a rich and important theory associated with 
them that has a multitude of practical applications. It is the study of matrices and 
related topics that forms the mathematical field that we call “linear algebra.” In this 
chapter we will begin our study of matrices. 
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1.1 Introduction to Systems of Linear Equations 

Systems of linear equations and their solutions constitute one of the major topics that we 
will study in this course. In this first section we will introduce some basic terminology and 
discuss a method for solving such systems. 

Linear Equations Recall that in two dimensions a line in a rectangular xy -coordinate system can be repre- 
sented by an equation of the form 

ax + by = c (a, b not both 0) 

and in three dimensions a plane in a rectangular xyz-coordinate system can be repre- 
sented by an equation of the form 

ax + by + cz = d (a, b, c not all 0) 

These are examples of “linear equations,” the first being a linear equation in the variables 
x and y and the second a linear equation in the variables x, y, and z. More generally, we 
define a linear equation in the n variables xi , X2, . . . , x„ to be one that can be expressed 
in the form 

fliXi + a 2 x 2 H b a„x„ — b (1) 

where a\ , a 2 , . . . , a n and b are constants, and the a’s are not all zero. In the special cases 
where n = 2 or n = 3, we will often use variables without subscripts and write linear 
equations as 


flix + a 2 y = b (dj , a 2 not both 0) (2) 

a\x + a 2 y + a 2 z = b (a\, a 2 , a 2 not all 0) (3) 

In the special case where b = 0, Equation ( 1 ) has the form 

a\X\ + a 2 x 2 + ■ ■ ■ + a n x„ = 0 (4) 

which is called a homogeneous linear equation in the variables xi , x 2 , . . . , x„ . 


► EXAMPLE 1 Linear Equations 

Observe that a linear equation does not involve any products or roots of variables. All 
variables occur only to the first power and do not appear, for example, as arguments of 
trigonometric, logarithmic, or exponential functions. The following are linear equations: 


x + 3y = 7 

\x - y + 3z = -1 

The following are not linear equations: 

x + 3y 2 = 4 
sin x + y = 0 


Xi — 2x2 — 3X3 + X4 = 0 

xi + x 2 H b x n = 1 

3x + 2_y — xy = 5 
yfx\ + 2 x 2 + X3 = 1 


A finite set of linear equations is called a system of linear equations or, more briefly, 
a linear system. The variables are called unknowns. For example, system (5) that follows 
has unknowns x and y, and system (6) has unknowns xi, x 2 , and X3. 


5x + y — 3 
2x — y — 4 


4xi — x 2 + 3x3 = — 1 
3xi + x 2 + 9x3 = — 4 


(5-6) 
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The double subscripting on 
the coefficients a tJ of the un- 
knowns gives their location 
in the system — the first sub- 
script indicates the equation 
in which the coefficient occurs, 
and the second indicates which 
unknown it multiplies. Thus, 
a l2 is in the first equation and 
multiplies x 2 . 


Linear Systems in Two and 
Three Unknowns 


A general linear system of m equations in the n unknowns Xi , x 2 , . . . , x„ can be written 
as 

a\\x i + a n x 2 + •■■ + a ln x„ — b\ 

02\X\ + CI22X2 + • ‘ ‘ + « 2 n X n = b2 

ci m 1 x 1 -f- ci m 2 X 2 T ■ ■ ■ T ri mn x n = b m 

A solution of a linear system in n unknowns x'i,x 2 , ■ ■ ■ , x„ is a sequence of n numbers 
si, S 2 , . ■ ■ , s„ for which the substitution 

X\ — S j , X 2 ~~ &2i • • • j Xfi — Sfi 

makes each equation a true statement. For example, the system in (5) has the solution 

x = l, y — -2 

and the system in (6) has the solution 

X\ = 1, X 2 = 2, X 3 = — 1 
These solutions can be written more succinctly as 

(1,-2) and (1,2, -1) 

in which the names of the variables are omitted. This notation allows us to interpret 
these solutions geometrically as points in two-dimensional and three-dimensional space. 
More generally, a solution 


X\ — S \ , X2 — ^2) • • • 5 — Sn 

of a linear system in n unknowns can be written as 

(s\,s 2 s„) 

which is called an ordered n-tuple. With this notation it is understood that all variables 
appear in the same order in each equation. If n — 2, then the n-tuple is called an ordered 
pair , and if n = 3, then it is called an ordered triple. 


Linear systems in two unknowns arise in connection with intersections of lines. For 
example, consider the linear system 

flix + b\y = ci 
112X + b2y = C2 

in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this 
system corresponds to a point of intersection of the lines, so there are three possibilities 
(Figure 1.1.1): 

The lines may be parallel and distinct, in which case there is no intersection and 
consequently no solution. 

2. The lines may intersect at only one point, in which case the system has exactly one 
solution. 

3. The lines may coincide, in which case there are infinitely many points of intersection 
( the points on the common line) and consequently infinitely many solutions. 

In general, we say that a linear system is consistent if it has at least one solution and 
inconsistent if it has no solutions. Thus, a consistent linear systemof two equations in 
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► Figure 1.1.1 



two unknowns has either one solution or infinitely many solutions — there are no other 
possibilities. The same is true for a linear system of three equations in three unknowns 

a\.x + b\y + c\z = d\ 
a 2 x + b 2 y + c 2 z = d 2 
a 3 x + b 3 y + c 3 z = d 3 

in which the graphs of the equations are planes. The solutions of the system, if any, 
correspond to points where all three planes intersect, so again we see that there are only 
three possibilities — no solutions, one solution, or infinitely many solutions (Figure 1.1.2). 



No solutions 


No solutions 


No solutions 


No solutions 

(three parallel planes; 


(two parallel planes; 


(no common intersection) 


(two coincident planes 

no common intersection) 


no common intersection) 




parallel to the third; 




no common intersection) 






One solution 
(intersection is a point) 


Infinitely many solutions 
(intersection is a line) 


Infinitely many solutions 
(planes are all coincident; 
intersection is a plane) 


Infinitely many solutions 
(two coincident planes; 
intersection is a line) 


A Figure 1.1.2 


We will prove later that our observations about the number of solutions of linear 
systems of two equations in two unknowns and linear systems of three equations in 
three unknowns actually hold for all linear systems. That is: 

Every system of linear equations has zero , one, or infinitely many solutions. There are 
no other possibilities. 
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► EXAMPLE 2 A Linear System with One Solution 

Solve the linear system 

x — y — 1 
2x + y — 6 

Solution We can eliminate x from the second equation by adding —2 times the first 
equation to the second. This yields the simplified system 

x — y = 1 
3y = 4 


From the second equation we obtain y = | , and on substituting this value in the first 
equation we obtain x = 1 + y = l . Thus, the system has the unique solution 


x 


• ’ 


y = 


4 

3 


Geometrically, this means that the lines represented by the equations in the system 
intersect at the single point (|, |). We leave it for you to check this by graphing the 
lines. 


► EXAMPLE 3 A Linear System with No Solutions 

Solve the linear system 

x + y = 4 
3x + 3v = 6 

Solution We can eliminate x from the second equation by adding —3 times the first 
equation to the second equation. This yields the simplified system 

x + y= 4 
0 = -6 

The second equation is contradictory, so the given system has no solution. Geometrically, 
this means that the lines corresponding to the equations in the original system are parallel 
and distinct. We leave it for you to check this by graphing the lines or by showing that 
they have the same slope but different y-intercepts. 


► EXAMPLE 4 A Linear System with Infinitely Many Solutions 

Solve the linear system 

4x — 2y = 1 
16x - 8y = 4 

Solution We can eliminate x from the second equation by adding —4 times the first 
equation to the second. This yields the simplified system 

4x — 2y = 1 
0 = 0 

The second equation does not impose any restrictions on x and y and hence can be 
omitted. Thus, the solutions of the system are those values of x and y that satisfy the 
single equation 

4x -2y=\ (8) 

Geometrically, this means the lines corresponding to the two equations in the original 
system coincide. One way to describe the solution set is to solve this equation for x in 
terms ofy to obtain x = ^ + and then assign an arbitrary valuer (called a parameter) 
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In Example 4 we could have 
also obtained parametric 
equations for the solutions 
by solving (8) for y in terms 
of x and letting x = t be 
the parameter. The resulting 
parametric equations would 
look different but would 
define the same solution set. 


Augmented Matrices and 
Elementary Row Operations 


As noted in the introduction 
to this chapter, the term “ma- 
trix” is used in mathematics to 
denote a rectangular array of 
numbers. In a later section 
we will study matrices in de- 
tail, but for now we will only 
be concerned with augmented 
matrices for linear systems. 


to y. This allows us to express the solution by the pair of equations (called parametric 
equations) 

x — | + \t, y = t 

We can obtain specific numerical solutions from these equations by substituting numer- 
ical values for the parameter t. For example, t = 0 yields the solution (|, 0) , t = 1 
yields the solution (|, l) , and t = — 1 yields the solution (— — l) . You can confirm 

that these are solutions by substituting their coordinates into the given equations. 


► EXAMPLE 5 A Linear System with Infinitely Many Solutions 

Solve the linear system 

x — y + 2z = 5 
2x — 2 y + 4z = 10 
3x — 3_y + 6z = 15 

Solution This system can be solved by inspection, since the second and third equations 
are multiples of the first. Geometrically, this means that the three planes coincide and 
that those values of x, y, and z that satisfy the equation 

x - y + 2z = 5 (9) 

automatically satisfy all three equations. Thus, it suffices to find the solutions of (9). 
We can do this by first solving this equation for x in terms of y and z, then assigning 
arbitrary values r and s (parameters) to these two variables, and then expressing the 
solution by the three parametric equations 

x = 5 + r — 2s, y = r, z = s 

Specific solutions can be obtained by choosing numerical values for the parameters r 
and s. For example, taking r — 1 and s = 0 yields the solution (6, 1,0). 


As the number of equations and unknowns in a linear system increases, so does the 
complexity of the algebra involved in finding solutions. The required computations can 
be made more manageable by simplifying notation and standardizing procedures. For 
example, by mentally keeping track of the location of the +’s, the x’s, and the =’s in the 
linear system 

a n xi + U 12 X 2 + • ■ ■ + a\ n x n — b\ 

#21X1 -f- Q 22 X 2 + • • • + 02nX n = t>2 

Cm I X | Gm2X2 "t“ ‘ ‘ ‘ T O mn X n — b m 

we can abbreviate the system by writing only the rectangular array of numbers 


All 

fll2 

Cl\n 

b] 

«21 

«22 

&2n 

b 2 


Om2 

&mn 

b m 


This is called the augmented matrix for the system. For example, the augmented matrix 
for the system of equations 

xi + X2 + 2x3 = 9 
2xi + 4x2 — 3x3 = 1 
3xi + 6 x 2 — 5x3 = 0 
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The basic method for solving a linear system is to perform algebraic operations on 
the system that do not alter the solution set and that produce a succession of increasingly 
simpler systems, until a point is reached where it can be ascertained whether the system 
is consistent, and if so, what its solutions are. Typically, the algebraic operations are: 
Multiply an equation through by a nonzero constant. 

2. Interchange two equations. 

3. Add a constant times one equation to another. 

Since the rows (horizontal lines) of an augmented matrix correspond to the equations in 
the associated system, these three operations correspond to the following operations on 
the rows of the augmented matrix: 

Multiply a row through by a nonzero constant. 

2. Interchange two rows. 

3. Add a constant times one row to another. 

These are called elementary row operations on a matrix. 

In the following example we will illustrate how to use elementary row operations and 
an augmented matrix to solve a linear system in three unknowns. Since a systematic 
procedure for solving linear systems will be developed in the next section, do not worry 
about how the steps in the example were chosen. Your objective here should be simply 
to understand the computations. 


► EXAMPLE 6 Using Elementary Row Operations 

In the left column we solve a system of linear equations by operating on the equations in 
the system, and in the right column we solve the same system by operating on the rows 
of the augmented matrix. 


x + y + 2z = 9 
2x + Ay — 3z = 1 
3x + 6y — 5z = 0 

Add —2 times the first equation to the second 
to obtain 

x + y + 2z = 9 

2 y - 7z = -17 
3x + 6y — 5z = 0 


112 9 

2 4-31 

3 6-50 

Add —2 times the first row to the second to 
obtain 

"112 9" 

0 2-7 -17 
3 6-5 0 



Maxime Bocher 
(1867-1918) 


Historical Note The first known use of augmented matrices appeared 
between 200 B.c. and 100 B.c. in a Chinese manuscript entitled Nine 
Chapters of Mathematical Art. The coefficients were arranged in 
columns rather than in rows, as today, but remarkably the system was 
solved by performing a succession of operations on the columns. The 
actual use of the term augmented matrix appears to have been intro- 
duced by the American mathematician Maxime Bocher in his book In- 
troduction to Higher Algebra, published in 1907. In addition to being an 
outstanding research mathematician and an expert in Latin, chemistry, 
philosophy, zoology, geography, meteorology, art, and music, Bocher 
was an outstanding expositor of mathematics whose elementary text- 
books were greatly appreciated by students and are still in demand 
today. 

[Image: Courtesy of the American Mathematical Society 

www.ams.org] 
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Add —3 times the first equation to the third to 
obtain 

x + y + 2z= 9 

2 y- lz — —11 
3 y - 1 lz = -27 

Multiply the second equation by 1 to obtain 
x + y + 2z = 9 

7 17 

y- 2 z = ~t 

3y - 11 z = -27 


Add — 3 times the first row to the third to obtain 

"11 29 " 

0 2-7 -17 

_0 3 -11 — 27_ 

Multiply the second row by 1 to obtain 

"112 9" 

0 1-2-17 

2 2 

0 3 -11 -27 


Add —3 times the second equation to the third Add —3 times the second row to the third to 
to obtain obtain 


x + y + 2 z = 9 



1 

0 

0 


1 

1 

0 


2 

7 

2 

1 

2 


9 


17 

2 

3 

'2 


Multiply the third equation by —2 to obtain 

x + y + 2z = 9 

y - 1 2 z = ~ 1 { 

z= 3 


Multiply the third row by —2 to obtain 

"112 9" 

0 1 - 2-12 

u 2 2 

0 0 13 


Add — 1 times the second equation to the first Add — 1 times the second row to the first to 


to obtain 

X + yZ = 

obtain 

35 

2 

1 

0 

11 

2 

35 

2 

y- \z = 

17 

2 

0 

1 

7 

2 

17 

2 

z = 

3 

0 

0 

1 

3 


The solution in this example 
can also be expressed as the or- 
dered triple (1, 2, 3) with the 
understanding that the num- 
bers in the triple are in the 
same order as the variables in 
the system, namely, x,y,z. 


Add — y times the third equation to the first 
and 2 times the third equation to the second to 
obtain 

x =1 

y =2 

z = 3 


Add — y times the third row to the first and 2 
times the third row to the second to obtain 

" 1001 " 

0 10 2 

0 0 13 


The solution jc = l,y = 2,z = 3is now evident. 


Exercise Set 1.1 

1. In each part, determine whether the equation is linear in X\, 
X 2 , and x 3 . 

(a) xi + 5x2 — \/2x 3 = 1 (b) xi + 3x2 + x 3 x 3 = 2 

(c) x\ = — 7x 2 + 3 x 3 (d) xf 2 + X 2 + 8x 3 = 5 

(e) xf — 2x2 + x 3 = 4 (f) trxi — -Jlxi = 7 I/3 


2. In each part, determine whether the equation is linear in x 
and y. 


(a) 2 1/3 x + V3y = 1 
(c) cos (y) x - Ay = log 3 
(e) xy = 1 


(b) 2x 1/3 + 3yy = 1 
(d) y cos x 4y = 0 
(f) y + 7 = x 
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3. Using the notation of Formula (7), write down a general linear 
system of 

(a) two equations in two unknowns. 

(b) three equations in three unknowns. 

(c) two equations in four unknowns. 

4. Write down the augmented matrix for each of the linear sys- 
tems in Exercise 3. 


In each part of Exercises - 6 , find a linear system in the un- 
knowns X] , Xi, x 3 , . . . , that corresponds to the given augmented 
matrix. 




‘2 

0 

O' 




'3 

0 

-2 

5" 

5. 

(a) 

3 

-4 

0 



(b) 

7 

1 

4 

-3 



0 

1 

1 




0 

-2 

1 

7 



‘O 

3 

-1 

-1 

-l" 






6. 

(a) 

5 

2 

0 

-3 

-6 








3 

0 

1 -4 


3' 






(b) 

-4 

0 


1 1 

- 

3 






-1 

3 

0 -2 

— 

9 







0 

0 

0 -1 

- 

2 






In each part of Exercises 8 , find the augmented matrix for 
the linear system. 


7. (a) 

— 2xi 

= 6 



(b) 

6x! 

- 

X2 -f 3 x 3 ; 

— L 


3xi 

= 8 






5x 2 - X 3 ■■ 



9xi 

= -3 








(0 


2x 2 

- 

3 x 4 

+ X5 

= 

0 




— 3xi 

- x 2 

+ -U 



= - 

-1 




6x1 

+ 2x2 

— x 3 + 

2x4 

— 3*5 

= 

6 



8 . (a) 

3xi — 

2X2 = 

-1 


(b) 

2xi 


4 - 2 x 3 

: 1 


4xi 4- 

■ 5x 2 = 

3 



3xi 

- 

1 + 4 x 3 = 

: 7 


7xi 4- 

■ 3x2 = 

2 



6x1 

+ 

X2 - X 3 = 

: 0 

(c) 

Xi 


= 1 








Xl 

= 2 









x 3 

= 3 








9. In each part, determine whether the given 3-tuple is a solution 
of the linear system 

2x\ — 4 x 2 — x 3 = 1 
Xi — 3 xi + X3 = 1 

3xi — 5 x 2 — 3x 3 = 1 

(a) (3, 1,1) (b) (3,-1, 1) (c) (13,5,2) 

(d) (y ’ f , 2) (e) (17,7,5) 

10. In each part, determine whether the given 3-tuple is a solution 
of the linear system 

x + 2y — 2z = 3 
3x - y + z = 1 
— x + 5y — 5z = 5 


(a) (f , f , 1) (b) (f , f , 0) (c) (5,8,1) 

«>(!.?.?) w (I’* 2 ) 

11. In each part, solve the linear system, if possible, and use the 
result to determine whether the lines represented by the equa- 
tions in the system have zero, one, or infinitely many points of 
intersection. If there is a single point of intersection, give its 
coordinates, and if there are infinitely many, find parametric 
equations for them. 

(a) 3x - 2y = 4 (b) 2x - 4y = 1 (c) x - 2y = 0 

6 x - 4y = 9 4x — 8 y = 2 x - 4y = 8 

12. Under what conditions on a and b will the following linear 
system have no solutions, one solution, infinitely many solu- 
tions? 

2x — 3 y = a 
4x — 6 y = b 

In each part of Exercises 13-14, use parametric equations to 
describe the solution set of the linear equation. 

13. (a) lx - 5y = 3 

(b) 3xi — 5x 2 + 4x 3 = 7 

(c) — 8 x 1 + 2 x 2 — 5x 3 + 6 x 4 = 1 

(d) 3u — 8 tu + 2x — y + 4z = 0 

14. (a) x + lOy = 2 

(b) xi + 3x2 — 12 x 3 = 3 

(c) 4x! -|- 2 x 2 4“ 3 x 3 4 - X 4 = 20 

(d) v + w + x — 5y + lz = 0 

In Exercises 15-16, each linear system has infinitely many so- 
lutions. Use parametric equations to describe its solution set. 

15. (a) 2x — 3y = 1 

6 x — 9y = 3 

(b) Xi + 3 x 2 — x 3 = —4 
3xi + 9 x 2 — 3x 3 = — 12 
— xi — 3 x 2 + x 3 = 4 

16. (a) 6 x 1 + 2x 2 = —8 (b) 2x — y + 2 z = —4 

3xi + x 2 = —4 6 x — 3y + 6z = — 12 

— 4x 4 - 2y - 4z = 8 

In Exercises 17-18, find a single elementary row operation that 
will create a 1 in the upper left corner of the given augmented ma- 
trix and will not create any fractions in its first row. 



'-3 

-1 

2 

4 


'0 

-1 

-5 

O' 

17. (a) 

2 

-3 

3 

2 

(b) 

2 

-9 

3 

2 


0 

2 

-3 

1 


1 

4 

-3 

3_ 


2 

4 

-6 

8 " 


7 

-4 

-2 

2 

18. (a) 

7 

1 

4 

3 

(b) 

3 

-1 

8 

1 


-5 

4 

2 

7 


-6 

3 

-1 

4 
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In Exercises 19-20, find all values of k for which the given 
augmented matrix corresponds to a consistent linear system. 


19. (a) 


1 

4 


k 

8 


-4 

2 



k 

8 


-1 

-4 


Let x, y, and z denote the number of ounces of the first, sec- 
ond, and third foods that the dieter will consume at the main 
meal. Find (but do not solve) a linear system in x, y, and z 
whose solution tells how many ounces of each food must be 
consumed to meet the diet requirements. 



21. The curve y = ax 2 + bx + c shown in the accompanying fig- 
ure passes through the points (xq, yi), (X2, yi), and (xy, >’3). 
Show that the coefficients a , b, and c form a solution of the 
system of linear equations whose augmented matrix is 


X 2 

X\ 

1 

y\ 

x\ 

X2 

1 

yi 

A 

x-i 

1 

y-b. 


26. Suppose that you want to find values for a, b, and c such that 
the parabola y = ax 2 + bx + c passes through the points 
(1, 1), (2, 4), and (—1, 1). Find (but do not solve) a system 
of linear equations whose solutions provide values for a, b , 
and c. How many solutions would you expect this system of 
equations to have, and why? 

27. Suppose you are asked to find three real numbers such that the 
sum of the numbers is 12, the sum of two times the first plus 
the second plus two times the third is 5, and the third number 
is one more than the first. Find (but do not solve) a linear 
system whose equations describe the three conditions. 



22. Explain why each of the three elementary row operations does 
not affect the solution set of a linear system. 

23. Show that if the linear equations 

x\ + kx 2 = c and x\ + lx 2 = d 

have the same solution set, then the two equations are identical 
(i.e., k = l and c = d). 

24. Consider the system of equations 

ax + by = k 
cx + dy = I 
ex + fy = m 

Discuss the relative positions of the lines ax + by = k, 
cx + dy = l, and ex + fy = m when 

(a) the system has no solutions. 

(b) the system has exactly one solution. 

(c) the system has infinitely many solutions. 

25. Suppose that a certain diet calls for 7 units of fat, 9 units of 
protein, and 16 units of carbohydrates for the main meal, and 
suppose that an individual has three possible foods to choose 
from to meet these requirements: 

Food 1: Each ounce contains 2 units of fat, 2 units of 
protein, and 4 units of carbohydrates. 

Food 2: Each ounce contains 3 units of fat, 1 unit of 
protein, and 2 units of carbohydrates. 

Food 3: Each ounce contains 1 unit of fat, 3 units of 
protein, and 5 units of carbohydrates. 


True-False Exercises 

TF. In parts (a)-(h) determine whether the statement is true or 
false, and justify your answer. 

(a) A linear system whose equations are all homogeneous must 
be consistent. 

(b) Multiplying a row of an augmented matrix through by zero is 
an acceptable elementary row operation. 

(c) The linear system 

x — y = 3 
2x-2y = k 

cannot have a unique solution, regardless of the value of k. 

(d) A single linear equation with two or more unknowns must 
have infinitely many solutions. 

(e) If the number of equations in a linear system exceeds the num- 
ber of unknowns, then the system must be inconsistent. 

(f) If each equation in a consistent linear system is multiplied 
through by a constant c, then all solutions to the new system 
can be obtained by multiplying solutions from the original 
system by c. 

(g) Elementary row operations permit one row of an augmented 
matrix to be subtracted from another. 

(h) The linear system with corresponding augmented matrix 

'2 -1 4 

0 0-1 

is consistent. 

Working with Technology 

Tl. Solve the linear systems in Examples 2, 3, and 4 to see how 
your technology utility handles the three types of systems. 

T2. Use the result in Exercise 21 to find values of a, b, and c 
for which the curve y = ax 2 + bx + c passes through the points 
(-1,1,4), (0, 0. 8), and (1, 1,7). 
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1.2 Gaussian Elimination 

In this section we will develop a systematic procedure for solving systems of linear 
equations. The procedure is based on the idea of performing certain operations on the rows 
of the augmented matrix that simplify it to a form from which the solution of the system 
can be ascertained by inspection. 


Considerations in Solving When considering methods for solving systems of linear equations, it is important to 
Linear Systems distinguish between large systems that must be solved by computer and small systems 
that can be solved by hand. For example, there are many applications that lead to 
linear systems in thousands or even millions of unknowns. Large systems require special 
techniques to deal with issues of memory size, roundoff errors, solution time, and so 
forth. Such techniques are studied in the field of numerical analysis and will only be 
touched on in this text. However, almost all of the methods that are used for large 
systems are based on the ideas that we will develop in this section. 

Echelon Forms In Example 6 of the last section, we solved a linear system in the unknowns x,y, and z 
by reducing the augmented matrix to the form 

"l 0 0 f 
0 10 2 
0 0 13 

from which the solution x — 1, y = 2, z = 3 became evident. This is an example of a 
matrix that is in reduced row echelon form. To be of this form, a matrix must have the 
following properties: 

If a row does not consist entirely of zeros, then the first nonzero number in the row 
is a 1. We call this a leading 1. 

2. If there are any rows that consist entirely of zeros, then they are grouped together at 
the bottom of the matrix. 

3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the 
lower row occurs farther to the right than the leading 1 in the higher row. 

Each column that contains a leading 1 has zeros everywhere else in that column. 

A matrix that has the first three properties is said to be in row echelon form. (Thus, 
a matrix in reduced row echelon form is of necessity in row echelon form, but not 
conversely.) 


► EXAMPLE 1 Row Echelon and Reduced Row Echelon Form 


The following matrices are in reduced row echelon form. 









"l 

0 

0 

4 


"l 

0 

0 


0 

1 

0 

7 

, 

0 

1 

0 

, 

0 

0 

1 

-1 


0 

0 

1 












1 -2 
0 0 

0 0 

0 0 


0 

1 

0 

0 


1 

3 

0 

0 


'0 o' 
0 0 


The following matrices are in row echelon form but not reduced row echelon form. 


~1 

4 

-3 

7~ 


"l 

1 

o’ 


"o 

1 

2 

6 

o’ 

0 

1 

6 

2 

, 

0 

1 

0 

, 

0 

0 

1 

-1 

0 

0 

0 

1 

5 


0 

0 

0 


0 

0 

0 

0 

1 
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► EXAMPLE 2 More on Row Echelon and Reduced Row Echelon Form 

As Example 1 illustrates, a matrix in row echelon form has zeros below each leading 1, 
whereas a matrix in reduced row echelon form has zeros below and above each leading 
1. Thus, with any real numbers substituted for the *’s, all matrices of the following types 
are in row echelon form: 

















"0 

1 

* 

* 

* 

* 

* 

* 

* 

*" 

”1 

* 

* 

* 


"1 

* 

* 

* 


"1 

* 

* 

* 


0 

0 

0 

1 

* 

* 

* 

* 

* 

* 

0 

1 

* 

* 


0 

1 

* 

* 


0 

1 

* 

* 


0 

0 

0 

0 

1 

* 

* 

* 

* 

* 

0 

0 

1 

* 


0 

0 

1 

* 


0 

0 

0 

0 


0 

0 

0 

0 

0 

1 

* 

* 

* 

* 

0 

0 

0 

1 


0 

0 

0 

0 _ 


0 

0 

0 

0 _ 



























_0 

0 

0 

0 

0 

0 

0 

0 

1 

* 


All matrices of the following types are in reduced row echelon form: 

















"0 

1 

* 

0 

0 

0 

* 

* 

0 


"1 

0 

0 

0 " 


"1 

0 

0 

* 


"1 

0 

* 

* 


0 

0 

0 

1 

0 

0 

* 

* 

0 

* 

0 

1 

0 

0 


0 

1 

0 

* 


0 

1 

* 

* 


0 

0 

0 

0 

1 

0 

* 

* 

0 

* 

0 

0 

1 

0 


0 

0 

1 

* 


0 

0 

0 

0 


0 

0 

0 

0 

0 

1 

* 

* 

0 

* 

0 

0 

0 

1 


0 

0 

0 

0 _ 


0 

0 

0 

0 _ 



























_0 

0 

0 

0 

0 

0 

0 

0 

1 

* 


If, by a sequence of elementary row operations, the augmented matrix for a system of 
linear equations is put in reduced row echelon form, then the solution set can be obtained 
either by inspection or by converting certain linear equations to parametric form. Here 
are some examples. 


► EXAMPLE 3 Unique Solution 

Suppose that the augmented matrix for a linear system in the unknowns x\, X 2 , * 3 , and 
x 4 has been reduced by elementary row operations to 

"1 0 0 0 3" 

0 10 0-1 
0 0 10 0 

_0 0 0 1 5_ 

This matrix is in reduced row echelon form and corresponds to the equations 


In Example 3 we could, if 
desired, express the solution 
more succinctly as the 4-tuple 
(3, -1,0, 5). 


x\ =3 

x 2 = -1 

X3 =0 

x 4 = 5 

Thus, the system has a unique solution, namely, x\ = 3, X 2 = — 1, X 3 = 0, x 4 = 5. 


► EXAMPLE 4 Linear Systems in Three Unknowns 

In each part, suppose that the augmented matrix for a linear system in the unknowns 
x, y, and z has been reduced by elementary row operations to the given reduced row 
echelon form. Solve the system. 



"1 

0 

0 

0" 


"1 

0 

3 

-1" 


"1 

-5 

1 

4" 

(a) 

0 

1 

2 

0 

(b) 

0 

1 

-4 

2 

(c) 

0 

0 

0 

0 


0 

0 

0 

1 


0 

0 

0 

0 


0 

0 

0 

0 


1.2 Gaussian Elimination 13 


We will usually denote pa- 
rameters in a general solution 
by the letters r, s,t, , but 
any letters that do not con- 
flict with the names of the 
unknowns can be used. For 
systems with more than three 
unknowns, subscripted letters 
such as t \ , h, ti, . . . are conve- 
nient. 


Solution (a) The equation that corresponds to the last row of the augmented matrix is 

Ox + Oy + Oz = 1 

Since this equation is not satisfied by any values of x, y, and z, the system is inconsistent. 
Solution ( b ) The equation that corresponds to the last row of the augmented matrix is 

Ox + Oy + Oz = 0 

This equation can be omitted since it imposes no restrictions on x, y, and z; hence, the 
linear system corresponding to the augmented matrix is 

x + 3z = — 1 
y - 4z— 2 

Since x and y correspond to the leading l’s in the augmented matrix, we call these 
the leading variables. The remaining variables (in this case z.) are called free variables. 
Solving for the leading variables in terms of the free variables gives 

x = — 1 — 3z 
y = 2 + 4z 

From these equations we see that the free variable z can be treated as a parameter and 
assigned an arbitrary value t, which then determines values for x and y. Thus, the 
solution set can be represented by the parametric equations 

x = — 1 — 3 1, y = 2 + 4t, z — i 

By substituting various values for t in these equations we can obtain various solutions 
of the system. For example, setting t — 0 yields the solution 

x = — 1, y = 2, z = 0 

and setting t = 1 yields the solution 

x = -4, y = 6, z = 1 

Solution (c) As explained in part (b), we can omit the equations corresponding to the 
zero rows, in which case the linear system associated with the augmented matrix consists 
of the single equation 

x - 5y + z = 4 (1) 

from which we see that the solution set is a plane in three-dimensional space. Although 
(1) is a valid form of the solution set, there are many applications in which it is preferable 
to express the solution set in parametric form. We can convert (1) to parametric form 
by solving for the leading variable x in terms of the free variables y and z to obtain 

x = 4 + 5 y — z 

From this equation we see that the free variables can be assigned arbitrary values, say 
y = s and z = f, which then determine the value of x. Thus, the solution set can be 
expressed parametrically as 

x = 4 + 5s — t, y = s, z = t M (2) 

Formulas, such as (2), that express the solution set of a linear system parametrically 
have some associated terminology. 


DEFINITION 1 If a linear system has infinitely many solutions, then a set of parametric 
equations from which all solutions can be obtained by assigning numerical values to 
the parameters is called a general solution of the system. 
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Elimination Methods 


We have just seen how easy it is to solve a system of linear equations once its augmented 
matrix is in reduced row echelon form. Now we will give a step-by-step elimination 
procedure that can be used to reduce any matrix to reduced row echelon form. As we 
state each step in the procedure, we illustrate the idea by reducing the following matrix 
to reduced row echelon form. 


0 

0 

-2 

0 

7 

12 

2 

4 

-10 

6 

12 

28 

2 

4 

-5 

6 

-5 

-1 


Step 1. Locate the leftmost column that does not consist entirely of zeros. 


0 

0 

-2 

0 

7 

12 

2 

4 

-10 

6 

12 

28 

2 

4 

-5 

6 

-5 

-1 


- Leftmost nonzero column 


Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry 
to the top of the column found in Step 1 . 

"2 4 -10 6 12 28' 

0 0 -2 0 7 12 

2 4 -5 6 -5 -1 

Step 3. If the entry that is now at the top of the column found in Step 1 is a, multiply 
the first row by 1 /a in order to introduce a leading 1. 

"l 2 -5 3 6 14 

0 0 -2 0 7 12 

2 4 -5 6 -5 -1 

Step 4. Add suitable multiples of the top row to the rows below so that all entries below 


the leading 1 become 

zeros. 




‘l 

2 

-5 

3 

6 

14" 



0 

0 

-2 

0 

7 

12 

^ —2 times the first row of the preceding 

matrix was added to the third row. 


0 

0 

5 

0 

-17 

-29 



The first row of the preceding matrix was 
multiplied by l . 


The first and second rows in the preceding 
matrix were interchanged. 


Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the 
submatrix that remains. Continue in this way until the entire matrix is in row 
echelon form. 


1 

0 

0 


'1 

0 

0 


2 

-5 

3 

6 

14 

0 

-2 

0 

7 

12 

0 

5 

0 

-17 

-29 


- Leftmost nonzero column 
in the submatrix 


2 -5 3 6 14' 

0 1 0-^-6 

0 5 0 -17 -29 


The first row in the submatrix was 
multiplied by — ^ to introduce a 
leading 1 . 
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1 

0 

.0 

1 

0 

.0 


2-5 3 

0 1 0 

0 0 0 

2-5 3 

0 1 0 

0 0 0 


6 

7 

2 

1 

2 

6 

7 

2 

1 

2 


14 

-6 

1. 

14 ' 

-6 

1 


-5 times the first row of the submatrix 
was added to the second row of the 
submatrix to introduce a zero below 
the leading 1 . 


— The top row in the submatrix was 
covered, and we returned again to 
Step 1. 


Leftmost nonzero column 
in the new submatrix 


1 2 -5 3 6 14 

0 0 1 0 -6 

0 0 0 0 1 2 


The first (and only) row in the new 
submatrix was multiplied by 2 to 
introduce a leading 1 . 


The entire matrix is now in row echelon form. To find the reduced row echelon form we 
need the following additional step. 

Step 6. Beginning with the last nonzero row and working upward, add suitable multiples 
of each row to the rows above to introduce zeros above the leading l’s. 


1 2 

0 0 
0 0 

1 2 

0 0 
0 0 

1 2 

0 0 
0 0 


-5 3 

1 0 
0 0 

-5 3 

1 0 
0 0 

0 3 

1 0 
0 0 


6 14 

0 1 

1 2 

0 2 
0 1 
1 2 

0 i 
0 1 

1 2 


1 times the third row of the preceding 
matrix was added to the second row. 


—6 times the third row was added to the 

first row. 


5 times the second row was added to the 
first row. 


The last matrix is in reduced row echelon form. 

The procedure (or algorithm) we have just described for reducing a matrix to reduced 
row echelon form is called Gaiiss-Jorelan elimination. This algorithm consists of two 
parts, a forward phase in which zeros are introduced below the leading 1 ’s and a backward 
phase in which zeros are introduced above the leading 1 ’s. If only theforward phase is 




Carl Friedrich Gauss Wilhelm Jordan 

(1777-1855) (1842-1899) 


Although versions of Gaussian elimination were known much 
earlier, its importance in scientific computation became clear when the great 
German mathematician Carl Friedrich Gauss used it to help compute the orbit 
of the asteroid Ceres from limited data. What happened was this: On January 1, 
1801 the Sicilian astronomer and Catholic priest Giuseppe Piazzi (1746-1826) 
noticed a dim celestial object that he believed might be a "missing planet." Fie 
named the object Ceres and made a limited number of positional observations 
but then lost the object as it neared the Sun. Gauss, then only 24 years old, 
undertook the problem of computing the orbit of Ceres from the limited data 
using a technique called "least squares," the equations of which he solved by 
the method that we now call "Gaussian elimination." The work of Gauss cre- 
ated a sensation when Ceres reappeared a year later in the constellation Virgo 
at almost the precise position that he predicted! The basic idea of the method 
was further popularized by the German engineer Wilhelm Jordan in his book 
on geodesy (the science of measuring Earth shapes) entitled Handbuch derVer- 
messungskunde and published in 1888. 

[Images: Photo Inc/Photo Researchers/Getty Images (Gauss): 

Leemage/Universal Images Group/Getty Images (Jordan)] 
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used, then the procedure produces a row echelon form and is called Gaussian elimination. 
For example, in the preceding computations a row echelon form was obtained at the end 
of Step 5. 


► EXAMPLE 5 Gauss-Jordan Elimination 

Solve by Gauss-Jordan elimination. 

Xi + 3x2 ~ 2x3 + 2x5 = 0 

2xi + 6x2 — 5x3 — 2x4 + 4x5 ~ 3x6 = — 1 

5X3 + 10X4 + 15X6 = 5 

2xi + 6x2 4" 8x4 4~ 4x5 4~ ^8x6 — 6 


Solution The augmented matrix for the system is 


1 

3 

-2 

0 

2 

0 

0“ 

2 

6 

-5 

-2 

4 

-3 

-1 

0 

0 

5 

10 

0 

15 

5 

2 

6 

0 

8 

4 

18 

6_ 


Adding —2 times the first row to the second and fourth rows gives 


13-20 
0 0 - 1-2 
0 0 5 10 
0 0 4 8 


2 0 0 

0 -3 -1 

0 15 5 

0 18 6 


Multiplying the second row by — 1 and then adding —5 times the new second row to the 
third row and —4 times the new second row to the fourth row gives 


13-202 
0 0 12 0 
0 0 0 0 0 
0 0 0 0 0 


0 

3 

0 

6 


0 

1 

0 

2 


Interchanging the third and fourth rows and then multiplying the third row of the re- 
sulting matrix by 2 gives the row echelon form 


1 3-2 0 2 
0 0 12 0 
0 0 0 0 0 
0 0 0 0 0 


0 

3 

1 

0 


0 

1 

1 

3 

0 


This completes the forward phase since 
there are zeros below the leading 1 ’s. 


Note that in constructing the 
linear system in (3) we ignored 
the row of zeros in the corre- 
sponding augmented matrix. 
Why is this justified? 


Adding —3 times the third row to the second row and then adding 2 times the second 
row of the resulting matrix to the first row yields the reduced row echelon form 


1 3 0 4 2 
0 0 12 0 
0 0 0 0 0 
0 0 0 0 0 


0 

0 

1 

0 


0 

0 

l 

3 

0 


This completes the backward phase since 
there are zeros above the leading 1 ’s. 


The corresponding system of equations is 


4- 4x4 4- 2x5 
X3 4~ 2x4 


X\ 4- 3x 2 


= 0 
= 0 
X6 = 5 


( 3 ) 
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Homogeneous Linear 
Systems 


Solving for the leading variables, we obtain 


X\ — —3X2 ~ 4x4 — 2 x 5 
X'3 = —2X4 
X6 — y 

Finally, we express the general solution of the system parametrically by assigning the 
free variables x?, x 4 , and x 5 arbitrary values r, s, and t, respectively. This yields 

x\ = —3 r — As — 2 1, X 2 = r, X 3 = —2s, X 4 = s, X 5 = t, X(, = | M 


A system of linear equations is said to be homogeneous if the constant terms are all zero; 
that is, the system has the form 


flllXl 

+ 

£?12*2 



01 n*« — 0 

021*1 

+ 

022*2 



02/i *« — 0 

0/w 1*1 

+ 

0m2*2 



0mn*n — ^ 


Every homogeneous system of linear equations is consistent because all such systems 
have x\ = 0, X 2 = 0, . . . , x„ = 0 as a solution. This solution is called the trivial solution-, 
if there are other solutions, they are called nontrivial solutions. 

Because a homogeneous linear system always has the trivial solution, there are only 
two possibilities for its solutions: 

The system has only the trivial solution. 

The system has infinitely many solutions in addition to the trivial solution. 

In the special case of a homogeneous linear system of two equations in two unknowns, 
say 

Cl\X + b\y — 0 I (7 1 . h 1 not both zero) 

CI 2 X + biy = 0 (« 2 , hi not both zero) 

the graphs of the equations are lines through the origin, and the trivial solution corre- 
sponds to the point of intersection at the origin (Figure 1.2.1). 


► Figure 1.2.1 



Only the trivial solution 


Infinitely many 


solutions 


There is one case in which a homogeneous system is assured of having nontrivial 
solutions — namely, whenever the system involves more unknowns than equations. To 
see why, consider the following example of four equations in six unknowns. 
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Free Variables in 
Homogeneous Linear 
Systems 


► EXAMPLE 6 A Homogeneous System 

Use Gauss- Jordan elimination to solve the homogeneous linear system 


jci + 3 x 2 — 2x3 + 2x5 = 0 

2xi + 6x? — 5x3 — 2x4 + 4x5 — 3x6 = 0 

(4) 

5 x 3 + 10x4 + 15x6 = 0 

2xi + 6x2 + 8x4 + 4 x 5 + 18 x 6 = 0 

Solution Observe first that the coefficients of the unknowns in this system are the same 
as those in Example 5; that is, the two systems differ only in the constants on the right 
side. The augmented matrix for the given homogeneous system is 


1 3 -2 0 2 0 0 

2 6 -5 -2 4 -3 0 

0 0 5 10 0 15 0 

2 6 0 8 4 18 0 


(5) 


which is the same as the augmented matrix for the system in Example 5, except for zeros 
in the last column. Thus, the reduced row echelon form of this matrix will be the same 
as that of the augmented matrix in Example 5, except for the last column. However, 
a moment’s reflection will make it evident that a column of zeros is not changed by an 
elementary row operation, so the reduced row echelon form of (5) is 

"1 3 0 4 2 0 0' 

0 0 1 2 0 0 0 

( 6 ) 

0 0 0 0 0 1 0 w 

_0 0 0 0 0 0 0 

The corresponding system of equations is 

Xi + 3x2 + 4x4 + 2 x 5 = 0 

X3 + 2x4 = 0 

X6 = 0 


Solving for the leading variables, we obtain 


xi = —3x2 — 4x4 — 2 x 5 

X3 = —2x4 (7) 

X6 = o 

If we now assign the free variables X2, X4, and x 5 arbitrary values r, s, and t, respectively, 
then we can express the solution set parametrically as 

x\ = —3 r — 4s — 2 1, X2 = r, X3 = — 2s, X4 — s, x$ = f, x 6 = 0 

Note that the trivial solution results when r = s = t = 0. 


Example 6 illustrates two important points about solving homogeneous linear systems: 

Elementary row operations do not alter columns of zeros in a matrix, so the reduced 
row echelon form of the augmented matrix for a homogeneous linear system has 
a final column of zeros. This implies that the linear system corresponding to the 
reduced row echelon form is homogeneous, just like the original system. 
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Note that Theorem 1.2.2 ap- 
plies only to homogeneous 
systems — a nonhomogeneous 
system with more unknowns 
than equations need not be 
consistent. However, we will 
prove later that if a nonho- 
mogeneous system with more 
unknowns then equations is 
consistent, then it has in- 
finitely many solutions. 


Gaussian Elimination and 
Back-Substitution 


2. When we constructed the homogeneous linear system corresponding to augmented 
matrix (6), we ignored the row of zeros because the corresponding equation 

O.Xl + 0^2 + 0x3 + 0x4 + 0x5 + 0x6 = 0 


does not impose any conditions on the unknowns. Thus, depending on whether or 
not the reduced row echelon form of the augmented matrix for a homogeneous linear 
system has any rows of zero, the linear system corresponding to that reduced row 
echelon form will either have the same number of equations as the original system 
or it will have fewer. 


Now consider a general homogeneous linear system with n unknowns, and suppose 
that the reduced row echelon form of the augmented matrix has r nonzero rows. Since 
each nonzero row has a leading 1, and since each leading 1 corresponds to a leading 
variable, the homogeneous system corresponding to the reduced row echelon form of 
the augmented matrix must have r leading variables and n — r free variables. Thus, this 
system is of the form 


**, +£() = « 
xk 2 + £( ) = o 


( 8 ) 


Xk r + £( ) = 0 


where in each equation the expression £ ( ) denotes a sum that involves the free variables, 
if any [see (7), for example]. In summary, we have the following result. 


Free Variable Theorem for Homogeneous Systems 

If a homogeneous linear system has n unknowns , and if the reduced row echelon form 
of its augmented matrix has r nonzero rows, then the system has n — r free variables. 


Theorem 1.2.1 has an important implication for homogeneous linear systems with 
more unknowns than equations. Specifically, if a homogeneous linear system has m 
equations in n unknowns, and if m < n, then it must also be true that r < n (why?). 
This being the case, the theorem implies that there is at least one free variable, and this 
implies that the system has infinitely many solutions. Thus, we have the following result. 


A homogeneous linear system with more unknowns than equations has 
infinitely many solutions. 


In retrospect, we could have anticipated that the homogeneous system in Example 6 
would have infinitely many solutions since it has four equations in six unknowns. 


For small linear systems that are solved by hand (such as most of those in this text), 
Gauss-Jordan elimination (reduction to reduced row echelon form) is a good procedure 
to use. However, for large linear systems that require a computer solution, it is generally 
more efficient to use Gaussian elimination (reduction to row echelon form) followed by 
a technique known as hack-substitution to complete the process of solving the system. 
The next example illustrates this technique. 
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► EXAMPLE 7 Example 5 Solved by Back-Substitution 

From the computations in Example 5, a row echelon form of the augmented matrix is 


1 3 -2 0 2 0 
0 0 1 2 0 3 
0 0 0 0 0 1 
0 0 0 0 0 0 


0 

1 

j_ 

3 

0 


To solve the corresponding system of equations 


X\ + 3x2 — 2x3 T 2x5 — 0 

X 3 + 2 x 4 + 3x6 = 1 

X6 — y 


we proceed as follows: 

Step 1. Solve the equations for the leading variables. 


Xi = — 3 X 2 + 2 x 3 — 2x 5 

X 3 = 1 — 2x4 — 3x6 



Step 2. Beginning with the bottom equation and working upward, successively substitute 
each equation into all the equations above it. 

Substituting X6 = i into the second equation yields 

X\ = —3X2 + 2X3 — 2 x 5 
X’3 = —2X4 
*6 = } 

Substituting X3 = —2x4 into the first equation yields 

Xi = —3x2 ~ 4x4 — 2x5 

X3 = —2X4 

*6 — j 

Step 3. Assign arbitrary values to the free variables, if any. 

If we now assign X2,X4, and X5 the arbitrary values r, s, and t, respectively, the 
general solution is given by the formulas 

x\ = —3 r — 4 s — 2 1, X 2 = r, X3 = — 2s, X4 = s, X5 = t, X(, = 2 

This agrees with the solution obtained in Example 5. 


► EXAMPLE 8 

Suppose that the matrices below are augmented matrices for linear systems in the un- 
knowns x 1 , X2 , X3 , and X4 . These matrices are all in row echelon form but not reduced row 
echelon form. Discuss the existence and uniqueness of solutions to the corresponding 
linear systems 
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"l 

-3 

7 

2 

5" 


"l 

-3 

7 

2 

5" 


"l 

-3 

7 

2 

5~ 

0 

1 

2 

-4 

1 

(b) 

0 

1 

2 

-4 

1 

(c) 

0 

1 

2 

-4 

1 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

0 

0 

1 


0 

0 

0 

0 

0 


0 

0 

0 

1 

0 


Solution (a) The last row corresponds to the equation 

Oxj T 0x2 T OX3 ~f~ 0x4 = 1 

from which it is evident that the system is inconsistent. 

Solution ( b ) The last row corresponds to the equation 

Oxi + 0X2 + 0X3 + 0X4 = 0 

which has no effect on the solution set. In the remaining three equations the variables 
xi , X2, and x 3 correspond to leading l’s and hence are leading variables. The variable x 4 
is a free variable. With a little algebra, the leading variables can be expressed in terms 
of the free variable, and the free variable can be assigned an arbitrary value. Thus, the 
system must have infinitely many solutions. 

Solution (c) The last row corresponds to the equation 

X4 = 0 

which gives us a numerical value for X4 . If we substitute this value into the third equation, 
namely, 

X3 + 6x4 = 9 

we obtain X3 = 9. You should now be able to see that if we continue this process and 
substitute the known values of x 3 and x 4 into the equation corresponding to the second 
row, we will obtain a unique numerical value for x 2 ; and if, finally, we substitute the 
known values of x 4 , x 3 , and x 2 into the equation corresponding to the first row, we will 
produce a unique numerical value for xi. Thus, the system has a unique solution. 

Some Facts About Echelon There are three facts about row echelon forms and reduced row echelon forms that are 
Forms important to know but we will not prove: 

Every matrix has a unique reduced row echelon form; that is, regardless of whether 
you use Gauss-Jordan elimination or some other sequence of elementary row oper- 
ations, the same reduced row echelon form will result in the end. 

2. Row echelon forms are not unique; that is, different sequences of elementary row 
operations can result in different row echelon forms. 

3. Although row echelon forms are not unique, the reduced row echelon form and all 
row echelon forms of a matrix A have the same number of zero rows, and the leading 
l’s always occur in the same positions. Those are called the pivot positions of A. A 
column that contains a pivot position is called a pivot column of A. 


A proof of this result can be found in the article “The Reduced Row Echelon Form of a Matrix Is Unique: A 
Simple Proof,” by Thomas Yuster, Mathematics Magazine , Vol. 57, No. 2, 1984, pp. 93-94. 
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► EXAMPLE 9 Pivot Positions and Columns 

Earlier in this section (immediately after Definition 1) we found a row echelon form of 


If A is the augmented ma- 


0 

0 

-2 

0 

7 

12 

trix for a linear system, then 

A = 

2 

4 

-10 

6 

12 

28 

the pivot columns identify the 
leading variables. As an illus- 
tration, in Example 5 the pivot to be 


2 

4 

-5 

6 

-5 

-1 

columns are 1, 3, and 6, and 


"l 

2 

-5 

3 

6 

14 

the leading variables are X \ , x 3 , 


0 

0 

1 

0 

_ 7 

-6 

and x 6 . 


0 

0 

0 

0 

2 

l 

2 


The leading Is occur in positions (row 1, column 1), (row 2, column 3), and (row 3, 
column 5). These are the pivot positions. The pivot columns are columns 1,3, and 5. 


Roundoff Error and There is often a gap between mathematical theory and its practical implementation — 
Instability Gauss-Jordan elimination and Gaussian elimination being good examples. The problem 
is that computers generally approximate numbers, thereby introducing roundoff errors, 
so unless precautions are taken, successive calculations may degrade an answer to a 
degree that makes it useless. Algorithms (procedures) in which this happens are called 
unstable. There are various techniques for minimizing roundoff error and instability. 
For example, it can be shown that for large linear systems Gauss-Jordan elimination 
involves roughly 50% more operations than Gaussian elimination, so most computer 
algorithms are based on the latter method. Some of these matters will be considered in 
Chapter 9. 


Exercise Set 1.2 

In Exercises -2, determine whether the matrix is in row ech- 
elon form, reduced row echelon form, both, or neither. 


(f) 


(d) 


(f) 



"l 

0 

o' 


"l 

0 

o' 


"o 

1 

o' 

1. (a) 

0 

1 

0 

(b) 

0 

1 

0 

(C) 

0 

0 

1 


0 

0 

1 


0 

0 

0 


0 

0 

0 


'1 2 3 4 5' 
10 7 13 
0 0 0 0 1 
0 0 0 0 0 


(g) 


1 -2 

0 0 


1 

-2 


(e) 


12 0 3 
0 0 11 
0 0 0 0 
0 0 0 0 


In Exercises 3—4, suppose that the augmented matrix for a lin- 
ear system has been reduced by row operations to the given row 
echelon form. Solve the system. 


3. (a) 


-3 

1 

0 


0 0 
0 0 
0 0 


(g) 

"l -7 

5 

5 


"l 

0 

8 

-5 

6' 

0 1 

3 

2 

(b) 

0 

1 

4 

-9 

3 

- 





0 

0 

1 

1 

2 















"1 

7 

-2 

0 

-8 

-3' 


"l 

2 

o' 


"l 

0 

o' 


"l 

3 

4 

(O 

0 

0 

1 

1 

6 

5 

2. (a) 

0 

1 

0 

(b) 

0 

1 

0 

(c) 

0 

0 

1 

0 

0 

0 

1 

3 

9 


0 

0 

0 


0 

2 

0 


0 

0 

0 


0 

0 

0 

0 

0 

0 



"l 

5 

-3' 


"l 

2 

3' 


"l 

-3 

7 

l" 

(d) 

0 

1 

1 

(e) 

0 

0 

0 

(d) 

0 

1 

4 

0 


0 

0 

0 


0 

0 

1 


0 

0 

0 

1 
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4. (a) 


1 

0 

0 


0 0-3 

1 0 0 

0 1 7 


(b) 


(c) 


(d) 


1 

0 

0 

'1 

0 

0 

0 

1 

0 

0 


0 

1 

0 

-6 

0 

0 

0 

-3 

0 

0 


0 -7 8 

0 3 2 

1 1 “ 5 _ 

0 0 3 -2' 

10 4 7 

0 15 8 

0 0 0 0 

0 o’ 

1 0 

0 1 


17. 3x\ + x 2 + *3 + *4 — 0 

5jCi — X 2 + X} — X4 = 0 


19. 2x + 2v + 4z = 0 
w — y — 3z = 0 

2w + 3x + y + z = 0 
— 2io + x + 3y — 2z — 0 

20. x\ + 3x 2 + X4 = 0 

x\ + 4 x 2 + 2x3 = 0 

— 2 x 2 — 2X3 — X4 = 0 
2 xi — 4 x 2 + X3 + X4 = 0 
X\ — 2 x 2 — X 3 + X 4 = 0 


18. v + 3u> — 2x = 0 

2 u + v — 4 w + 3x = 0 
2 u + 3v + 2w — x = 0 
—4 u — 3v + 5u> — 4x = 0 


In Exercises -8, solve the linear system by Gaussian elimi- 
nation. 

5. X\ -f- x 2 -I - 2 x 3 =8 6. 2xi -I - 2 x 2 - 1 - 2 x 3 = 0 

— x ] — 2 x 2 -|- 3 X 3 = 1 — 2x[ -f- 5 x 2 + 2 .X 3 = 1 

3xi — 7 x 2 + 4 x 3 = id 8 x 1 + x 2 -f- 4 x 3 = — 1 

7. x— y + 2z — w — — 1 

2x + y — 2z — 2w = —2 

—x + 2y — 4z + w = 1 

3x — 3w = —3 

8. - 2fc + 3c = 1 

3a + 6b — 3c = —2 
6a + 67> + 3c = 5 

In Exercises -12, solve the linear system by Gauss-Jordan 
elimination. 

9. Exercise 5 10. Exercise 6 

11. Exercise 7 12. Exercise 8 

In Exercises -14, determine whether the homogeneous sys- 
tem has nontrivial solutions by inspection (without pencil and 
paper). 

13. 2xi — 3 x 2 + 4x3 — X 4 = 0 
7xi + x 2 — 8x3 + 9x 4 = 0 
2xi + 8 x 2 + X 3 — X 4 = 0 

14. Xi + 3 x 2 — x 3 = 0 

x 2 — 8x3 = 0 
4x 3 = 0 

In Exercises 15-22, solve the given linear system by any 
method. 


21. 2/i - I 2 + 3/3 + 44 = 9 

h - 2/3 + 74=11 
3/i - 3/2 + h + 5A = 8 
2/i + h + 4/3 + 4/4 = 10 

22. Z3 + Z4 + Z5 = 0 

— ’ Z ,\ — 2 2 + 2Z3 — 3Z4 + Z5 = 0 

+ Z 2 — 2Z 3 — Z5 = 0 

2Zi + 2Z 2 — Z3 + Z5 = 0 

In each part of Exercises 23-24, the augmented matrix for a 
linear system is given in which the asterisk represents an unspec- 
ified real number. Determine whether the system is consistent, 
and if so whether the solution is unique. Answer “inconclusive” if 
there is not enough information to make a decision. 



’1 

* 

* 

* 


’1 

* 

* 

* 

23. (a) 

0 

1 

* 

* 

(b) 

0 

1 

* 

* 


_0 

0 

1 

* 


_0 

0 

0 

0 _ 


'1 

* 

* 

* 


’1 

* 

* 

* 

(O 

0 

1 

* 

* 

(d) 

0 

0 

* 

0 


0 

0 

0 

1 


0 

0 

1 

* 


’1 

* 

* 

* 


’1 

0 

0 

* 

24. (a) 

0 

1 

* 

* 

(b) 

* 

1 

0 

* 


0 

0 

1 

1 


* 

* 

1 

* 


’1 

0 

0 

0“ 


’1 

* 

* 

* 

(c) 

1 

0 

0 

1 

(d) 

1 

0 

0 

1 


1 

* 

* 

* 


1 

0 

0 

1 


In Exercises 25-26, determine the values of a for which the 
system has no solutions, exactly one solution, or infinitely many 
solutions. 


15. 2xi + x 2 + 3x 3 — 9 
Xi + 2 x 2 = 0 

x 2 + X3 = 0 


16. 2x — y — 3<: = 0 
—x + 2y — 3z = 0 
x + y + 4z = 0 


25. x + 2y - 3z = 4 

3x — y + 5 z= 2 

4x + y + (a 2 — 14 )z = a + 2 
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26. x + 2y + z = 2 

2x — 2y + 3z = 1 

x + 2y — (a 2 — 3)z = a 

In Exercises 27-28, what condition, if any, must a, b , and c 
satisfy for the linear system to be consistent? 

27. x + 3y — z = a 28. x + 3y + z = a 

x + y + 2z = b —x — 2y + z = b 

2y — 3 z — c 3x + 7y — z = c 

In Exercises >0, solve the following systems, where a , b, 
and c are constants. 

29. 2x + y = a 30. X\ + X2 + .*3 = a 

3x + 6y = b 2x\ + 2x 3 = b 

3^2 T 3-t3 — c 

31. Find two different row echelon forms of 

"l 3" 

2 7 

This exercise shows that a matrix can have multiple row eche- 
lon forms. 

32. Reduce 

‘2 1 3~ 

0 -2 -29 

3 4 5 

to reduced row echelon form without introducing fractions at 
any intermediate stage. 

33. Show that the following nonlinear system has 1 8 solutions if 
0 < a < 2n, 0 < /I < 2 n , and 0 < y < 2n. 


36. Solve the following system for x, y, and z. 



+ - + — = 5 

x y z 

37. Find the coefficients a,b,c, and d so that the curve shown 
in the accompanying figure is the graph of the equation 
y = ax 3 + bx 2 + cx + d. 



Figure Ex-37 


38. Find the coefficients a, b, c, and d so that the circle shown in 
the accompanying figure is given by the equation 
ax 2 + ay 2 + bx + cy + d = 0. 



◄ Figure Ex-38 


sin a + 2 cos fi + 3 tan y = 0 
2 sin a + 5 cos f) + 3 tan y — 0 
— sin a — 5 cos f) + 5 tan y = 0 

[Hint: Begin by making the substitutions x = sin a, 
y = cos fi, and z = tan y.] 

34. Solve the following system of nonlinear equations for the un- 
known angles a, f), and y, where 0 < a < 2n, 0 < /S < 2n, 
and 0 < y < n . 

2 sin a — cos f) + 3 tan y = 3 
4 sin a + 2 cos /S — 2 tan y = 2 
6 sin a — 3 cos f 3 + tan y = 9 

35. Solve the following system of nonlinear equations for x, y, 
and z. 

x 2 + y 2 + z 2 = 6 
x 2 - y 2 + 2z 2 = 2 
2x 2 + y 2 - z 2 = 3 

[Hint: Begin by making the substitutions X — x 2 , Y — y 2 , 
Z = z 2 .] 


39. If the linear system 

fl|X + biy + tyz = 0 
a 2 x — b 2 y + c 2 z = 0 
a 3 x + h 3 y — C3Z = 0 

has only the trivial solution, what can be said about the solu- 
tions of the following system? 

aix + &iy + ciz= 3 
a 2 x - b 2 y + c 2 z = 7 
a 3 x + b 3 y — c 3 z = 11 

40. (a) If A is a matrix with three rows and five columns, then 

what is the maximum possible number of leading l’s in its 
reduced row echelon form? 

(b) If B is a matrix with three rows and six columns, then 
what is the maximum possible number of parameters in 
the general solution of the linear system with augmented 
matrix B? 

(c) If C is a matrix with five rows and three columns, then 
what is the minimum possible number of rows of zeros in 
any row echelon form of C? 
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41. Describe all possible reduced row echelon forms of 






a 

b 

c 

d 

Cl 

b 

c 


e 

f 

8 

h 

d 

e 

/ 

(b) 


I 


h 

i 

j 

k 

_8 

i 









m 

n 

P 

q. 


42. Consider the system of equations 

ax + by = 0 
cx + dy = 0 
ex + fy = 0 

Discuss the relative positions of the lines ax + by = 0, 
cx + dy = 0, and ex + /y = 0 when the system has only the 
trivial solution and when it has nontrivial solutions. 


Working with Proofs 

43. (a) Prove that if ad — be ^ 0, then the reduced row echelon 
form of 


(b) Use the result in part (a) to prove that if ad — be ^ 0, then 
the linear system 

ax + by — k 
cx + dy = / 
has exactly one solution. 


(d) A homogeneous linear system in n unknowns whose corre- 
sponding augmented matrix has a reduced row echelon form 
with r leading l’s has n — r free variables. 

(e) All leading l’s in a matrix in row echelon form must occur in 
different columns. 

(f ) If every column of a matrix in row echelon form has a leading 
1, then all entries that are not leading l’s are zero. 

(g) If a homogeneous linear system of n equations in n unknowns 
has a corresponding augmented matrix with a reduced row 
echelon form containing n leading 1 ’s, then the linear system 
has only the trivial solution. 

(h) If the reduced row echelon form of the augmented matrix for 
a linear system has a row of zeros, then the system must have 
infinitely many solutions. 

(i) If a linear system has more unknowns than equations, then it 
must have infinitely many solutions. 

Working with Technology 

Tl. Find the reduced row echelon form of the augmented matrix 

for the linear system: 

6xj + X 2 + 4x4 = — 3 

— 9 xi + 2x 2 + 3 x 3 — 8x4 = 1 

7x] — 4 x 3 + 5x'4 = 2 


True-False Exercises 

TF. In parts (a)-(i) determine whether the statement is true or 

false, and justify your answer. 

(a) If a matrix is in reduced row echelon form, then it is also in 
row echelon form. 

(b) If an elementary row operation is applied to a matrix that is 
in row echelon form, the resulting matrix will still be in row 
echelon form. 

(c) Every matrix has a unique row echelon form. 


Use your result to determine whether the system is consistent and, 
if so, find its solution. 

T2. Find values of the constants A, S, C, and D that make the 
following equation an identity (i.e., true for all values of x). 

3x 3 + 4x 2 — 6x Ax + B C D 

(x 2 + 2x + 2)(x 2 — 1) x 2 + 2x + 2 x — 1 x + 1 

[Hint: Obtain a common denominator on the right, and then 
equate corresponding coefficients of the various powers of x in 
the two numerators. Students of calculus will recognize this as a 
problem in partial fractions.] 


1.3 Matrices and Matrix Operations 

Rectangular arrays of real numbers arise in contexts other than as augmented matrices for 
linear systems. In this section we will begin to study matrices as objects in their own right 
by defining operations of addition, subtraction, and multiplication on them. 


Matrix Notation and In Section 1.2 we used rectangular arrays of numbers, called augmented matrices, to 
Terminology abbreviate systems of linear equations. However, rectangular arrays of numbers occur 
in other contexts as well. For example, the following rectangular array with three rows 
and seven columns might describe the number of hours that a student spent studying 
three subjects during a certain week: 
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Mon. 

Tues. 

Wed. 

Thurs. 

Fri. 

Sat. 

Sun. 

Math 

2 

3 

2 

4 

1 

4 

2 

History 

0 

3 

1 

4 

3 

2 

2 

Language 

4 

1 

3 

1 

0 

0 

2 


If we suppress the headings, then we are left with the following rectangular array of 
numbers with three rows and seven columns, called a “matrix”: 

"2 3 2 4 1 4 2" 

0 3 1 4 3 2 2 

4 13 10 0 2 

More generally, we make the following definition. 


DEFINITION 1 A matrix is a rectangular array of numbers. The numbers in the array 
are called the entries in the matrix. 


Matrix brackets are often 
omitted from 1 x 1 matri- 
ces, making it impossible to 
tell, for example, whether the 
symbol 4 denotes the num- 
ber “four” or the matrix [4], 
This rarely causes problems 
because it is usually possible 
to tell which is meant from the 
context. 


► EXAMPLE 1 Examples of Matrices 

Some examples of matrices are 


" 1 2" 

3 0 

, [2 1 0-3], 

e n 

0 \ 

^ - 
1 


T 

_3_ 

r~ 

i 


0 0 

1 

o 




The size of a matrix is described in terms of the number of rows (horizontal lines) 
and columns (vertical lines) it contains. For example, the first matrix in Example 1 has 
three rows and two columns, so its size is 3 by 2 (written 3 x 2). In a size description, 
the first number always denotes the number of rows, and the second denotes the number 
of columns. The remaining matrices in Example 1 have sizes 1x4, 3x3, 2x1, and 
lxl, respectively. 

A matrix with only one row, such as the second in Example 1 , is called a row vector 
(or a row matrix), and a matrix with only one column, such as the fourth in that example, 
is called a column vector (or a column matrix). The fifth matrix in that example is both 
a row vector and a column vector. 

We will use capital letters to denote matrices and lowercase letters to denote numeri- 
cal quantities; thus we might write 


A = 


'2 

3 


1 

4 


r 

2 


or C = 


b c " 
e /. 


When discussing matrices, it is common to refer to numerical quantities as scalars. Unless 
stated otherwise, scalars will be real numbers', complex scalars will be considered later in 
the text. 

The entry that occurs in row i and column j of a matrix A will be denoted by a ;; - . 
Thus a general 3x4 matrix might be written as 
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A matrix with n rows and n 
columns is said to be a square 
matrix of order n. 


Operations on Matrices 


A = 


and a general m x n matrix as 


A = 


flu 

«12 

a 13 

a 14 

a 2 i 

a 22 

«23 

«24 

fl31 

«32 

fl33 

«34 

flu 

«12 


Cl\n 

a 2 i 

a 22 


&2n 

Om 1 

Cm2 


Cl inn 


( 1 ) 


When a compact notation is desired, the preceding matrix can be written as 


[@ij]mxn Or [fly ] 

the first notation being used when it is important in the discussion to know the size, 
and the second when the size need not be emphasized. Usually, we will match the letter 
denoting a matrix with the letter denoting its entries; thus, for a matrix B we would 
generally use b,j for the entry in row i and column j, and for a matrix C we would use 
the notation Cy . 

The entry in row i and column j of a matrix A is also commonly denoted by the 
symbol (A),,-. Thus, for matrix (1) above, we have 


A = 


(A )ij — Clij 

and for the matrix 

r 2 -3" 

.7 0 . 

we have (A)n = 2, (A ) 12 = -3, (A) 2 i = 7, and (A) 22 = 0. 

Row and column vectors are of special importance, and it is common practice to 
denote them by boldface lowercase letters rather than capital letters. For such matrices, 
double subscripting of the entries is unnecessary. Thus a general 1 x n row vector a and 
a general m x 1 column vector b would be written as 

~b{ 


a = [«! a 2 


a„] and b = 


b 2 


A matrix A with n rows and n columns is called a square matrix of order //, and the 
shaded entries an, a 22 , . . . , a nn in (2) are said to be on the main diagonal of A. 


a ii 

«12 • 

Cl\n 

A21 

an ■ 

Cl2n 

fl;;l 

a ri2 

Clnn 


So far, we have used matrices to abbreviate the work in solving systems of linear equa- 
tions. For other applications, however, it is desirable to develop an “arithmetic of ma- 
trices” in which matrices can be added, subtracted, and multiplied in a useful way. The 
remainder of this section will be devoted to developing this arithmetic. 


DEFINITION 2 Two matrices are defined to be equal if they have the same size and 
their corresponding entries are equal. 
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The equality of two matrices 

A — [fly] and B = [by] 

of the same size can be ex- 
pressed either by writing 

(A)y = (B)y 

or by writing 

fly = by 

where it is understood that the 
equalities hold for all values of 
i and j. 


EXAMPLE 2 Equality of Matrices 

Consider the matrices 



2 r 


[2 11 


'2 

1 O' 

A = 

3 x 

, B = 

i/V 

m i 

, c = 

3 

4 0 


If x = 5, then A = B, but for all other values of a' the matrices A and B are not equal, 
since not all of their corresponding entries are equal. There is no value of x for which 
A — C since A and C have different sizes. M 


DEFINITION 3 If A and B are matrices of the same size, then the sum A + B is the 
matrix obtained by adding the entries of B to the corresponding entries of A, and 
the difference A — B is the matrix obtained by subtracting the entries of B from the 
corresponding entries of A. Matrices of different sizes cannot be added or subtracted. 


In matrix notation, if A = [fly ] and B = [bjj~\ have the same size, then 


(A + B)ij = {A)ij + (B),j = aij + bij and (A - B) tj = (A)y - (£),;, = fly - by 


► EXAMPLE 3 Addition and Subtraction 

Consider the matrices 



2 

1 

0 

3" 


"-4 

3 

5 

l" 

A = 

-1 

0 

2 

4 

, B = 

2 

2 

0 

-1 


4 

-2 

7 

0 


3 

2 

-4 

5 


Then 


B = 


C = 


t r 
2 2 


-2 4 5 4" 


’6-2-5 2 

12 2 3 

and A — B = 

-3-2 2 5 

7 0 3 5 


1-4 11-5 


The expressions A + C, B + C, A — C, and B — C are undefined. 


DEFINITION 4 If A is any matrix and c is any scalar, then the product cA is the matrix 
obtained by multiplying each entry of the matrix A by c. The matrix cA is said to be 
a scalar multiple of A. 


In matrix notation, if A = [fly], then 

(cA)y = C(A)y = Cfly 

► EXAMPLE 4 Scalar Multiples 

For the matrices 



'2 3 4' 


0 

2 

1 

, C = 

'9 

-6 

3' 

A = 

1 3 1 

, B = 

-1 

3 

-5 

3 

0 

12 







'4 6 8' 

II 

05 

7 

'0 -2 -1 

. \ c = 

'3 -2 r 

2 6 2 

1 -3 5 

’ 3 

1 0 4 


It is common practice to denote (— l)B by — B. 
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Thus far we have defined multiplication of a matrix by a scalar but not the multi- 
plication of two matrices. Since matrices are added by adding corresponding entries 
and subtracted by subtracting corresponding entries, it would seem natural to define 
multiplication of matrices by multiplying corresponding entries. However, it turns out 
that such a definition would not be very useful for most problems. Experience has led 
mathematicians to the following more useful definition of matrix multiplication. 


DEFINITION 5 If A is an m x r matrix and B is an r x n matrix, then the product 
AB is the m x n matrix whose entries are determined as follows: To find the entry in 
row i and column j of AB, single out row i from the matrix A and column j from 
the matrix B . Multiply the corresponding entries from the row and column together, 
and then add up the resulting products. 


► EXAMPLE 5 Multiplying Matrices 

Consider the matrices 

4 14 3' 
0-131 
2 7 5 2 


A = 


1 2 4 

2 6 0 


B = 


Since A is a 2 x 3 matrix and B is a 3 x 4 matrix, the product AB is a 2 x 4 matrix. 
To determine, for example, the entry in row 2 and column 3 of AB , we single out row 2 
from A and column 3 from B. Then, as illustrated below, we multiply corresponding 
entries together and add up these products. 


T 

2 


2 4' 

6 0 


1 4 

-1 3 

7 5 


3 

1 

2 


□ □□□ 

□ □[260 


(2 ■ 4) + (6 ■ 3) + (0 • 5) = 26 


The entry in row 1 and column 4 of AB is computed as follows: 


'1 2 
2 6 



1 

-1 

7 


4 3 

3 1 

5 2 


EH 

□ □□□ 


(1 - 3) + (2 ■ 1) + (4 ■ 2) = 13 


The computations for the remaining entries are 


(1 ■ 4) + (2 ■ 0) + (4 ■ 2) = 12 
(1 ■ 1) - (2- 1) + (4- 7) = 27 
(1-4) + (2-3) + (4-5)= 30 
(2 ■ 4) + (6 ■ 0) + (0 ■ 2) = 8 
(2-1) -(6-1) + (0-7) = -4 
(2-3) + (6- 1) + (0-2) = 12 



27 

-4 


30 

26 


13' 

12 


◄ 


The definition of matrix multiplication requires that the number of columns of the 
first factor A be the same as the number of rows of the second factor B in order to form 
the product AB. If this condition is not satisfied, the product is undefined. A convenient 
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Partitioned Matrices 


way to determine whether a product of two matrices is defined is to write down the size 
of the first factor and, to the right of it, write down the size of the second factor. If, as in 
(3), the inside numbers are the same, then the product is defined. The outside numbers 
then give the size of the product. 


A 

m x r 


B 

r x n 


AB 

m x n 


Inside 


Outside 


( 3 ) 


► EXAMPLE 6 Determining Whether a Product Is Defined 

Suppose that A, B, and C are matrices with the following sizes: 

ABC 
3x4 4x7 7x3 

Then by (3), AB is defined and is a 3 x 7 matrix; BC is defined and is a 4 x 3 matrix; and 
CA is defined and is a 7 x 4 matrix. The products AC, CB, and BA are all undefined. 


In general, if A = [a^] is an m x r matrix and B = [ b tl ] is an r x n matrix, then, as 
illustrated by the shading in the following display, 


AB = 


ail 

ai2 

air 





021 

ai2 

air 


bn b \2 ••• 

h Vi • 

b\n 





&21 b22 ■■■ 

b 2j ■ 

bln 

an 

a /2 ■ ■ ■ 

air 





a/ 77 1 

dm2 

amr 


_brl b r2 ■■■ 

b rj • 

b r n 

in row t 

and column j of AB is given by 




( 4 ) 


(AB)ij — aab\j + aiibzj + a^b^j + ■ ■ • + aj r b r j 
Formula (5) is called the row-column rule for matrix multiplication. 


( 5 ) 


A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal 
and vertical rules between selected rows and columns. For example, the following are 
three possible partitions of a general 3x4 matrix A — the first is a partition of A into 

The concept of matrix multiplica- 
tion is due to the German mathematician Gotthold 
Eisenstein, who introduced the idea around 1844 to 
simplify the process of making substitutions in lin- 
ear systems. The idea was then expanded on and 
formalized by Cayley in his Memoir on the Theory 
of Matrices that was published in 1858. Eisenstein 
was a pupil of Gauss, who ranked him as the equal 
of Isaac Newton and Archimedes. However, Eisen- 
stein, suffering from bad health his entire life, died 
at age 30, so his potential was never realized. 

[ Image: http://www-history. mcs. st-andrews. ac. uk/ 

Gotthold Eisenstein Biographies/Eisenstein.html ] 

(1823-1852) 
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Matrix Multiplication by 
Columns and by Rows 


We now have three methods 
for computing a product of 
two matrices, entry by entry 
using Definition 5, column 
by column using Formula (8), 
and row by row using For- 
mula^). We will call these the 
entry method , the row method , 
and the column method, re- 
spectively. 


four submatrices A y i , A 12 , A 2 1 , and A 22 ; the second is a partition of A into its row vectors 
ri, r 2 , and r 3 ; and the third is a partition of A into its column vectors Ci, C2, c 3 , and C4: 


A = 

0\\ 

«21 

an 

«22 

a 13 

023 

au 

024 

— 

1 1 

a 12 ‘ 

a 22 _ 


fl 3 l 

«32 

O 33 

034 



a \\ 

a u 

a l3 

a \4 


r l 


A = 

ai\ 

0-22 

023 

024 

= 

t2 



a 3 i 

fl32 

O 33 

O 34 


t3 



an 

0\2 

Ol3 

0\4 




A = 

a 2 i 

022 

023 

024 

= 

[Cl 

C2 C 3 


a 3\ 

O 32 

O 33 

a 34 





C4] 


Partitioning has many uses, one of which is for finding particular rows or columns of a 
matrix product AB without computing the entire product. Specifically, the following for- 
mulas, whose proofs are left as exercises, show how individual column vectors of AB can 
be obtained by partitioning B into column vectors and how individual row vectors of 
AB can be obtained by partitioning A into row vectors. 

AB = A[bi b 2 ■ ■ • b„] = [Abi Ab 2 ■ ■ • Ab„] (6) 

(AB computed column by column) 



ai 


ai B 

AB = 

a2 

B = 

a 2 B 


a m 


a „,B 


(AB computed row by row) 

In words, these formulas state that 

/ th column vector of AB — A[ jth column vector of B] 


( 7 ) 


( 8 ) 


/th row vector of AB — [/th row vector of A ] B (9) 


► EXAMPLE 7 Example 5 Revisited 

If A and B are the matrices in Example 5, then from (8) the second column vector of 
AB can be obtained by the computation 

"1 2 4' 

2 6 0 



Second column Second column 

of B of AB 
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Matrix Products as Linear 
Combinations 

Definition 6 is applicable, in 
particular, to row and column 
vectors. Thus, for example, a 
linear combination of column 
vectors xi,x 2 , ...,x r of the 
same size is an expression of 
the form 

C1X1 + c 2 x 2 H h c r x r 


and from (9) the first row vector of AB can be obtained by the computation 


[1 2 4] 


I First row of A 


4 

0 

2 


1 

-1 

7 


4 3 

3 1 

5 2 


= [12 27 30 13] 


First row of AB 


The following definition provides yet another way of thinking about matrix multipli- 
cation. 


DEFINITION 6 If Ai, A 2 , . . . , A r are matrices of the same size, and if ci, C2, . . . , c r 
are scalars, then an expression of the form 

CiAi + C 2 A 2 + ■ • • + c r A r 

is called a linear combination of Ai, A2, . . . , A r with coefficients c\, Cj, . ■ ■ , c r . 


To see how matrix products can be viewed as linear combinations, let A be an m x n 
matrix and x an n x 1 column vector, say 


«11 

a 12 

Cl\n 


Xl 

fl21 

0-22 

@2 n 

and x = 

*2 

O /,! 1 

dm2 

&mn 


_X n _ 


a\\X\ + Cl\2X2 + • 

• + &\ n%n 


All 


a\2 


&\n 

a 2l*l + G 22 X 2 + ' 

* "I" Cl2 n%n 

= X\ 

«21 

+ X 2 

0-22 

+ • • • + x n 

®2n 

a m \X\ + a m 2 X 2 + ■ 

' “1“ ^mn^-n 


O-mX 


Om2 


ft mn 


This proves the following theorem. 


( 10 ) 


!EM 1.3.1 If A is an m x n matrix, and if x is an n x 1 column vector, then the 
product Ax can be expressed as a linear combination of the column vectors of A in which 
the coefficients are the entries of x. 


► EXAMPLE 8 Matrix Products as Linear Combinations 

The matrix product 


-1 3 2 


2 


l" 

1 2 -3 


-1 

= 

-9 

1 

K> 

1 

K> 

1 


3 


-3 


can be written as the following linear combination of column vectors: 


-1 


3 


2 


1 

1 

- 1 

2 

+ 3 

-3 

= 

-9 

2 


1 


-2 


-3 


2 
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Column-Row Expansion 


► EXAMPLE 9 Columns of a Product AB as Linear Combinations 

We showed in Example 5 that 





'4 

1 

4 

3' 







'1 2 

4 







"12 

27 

30 

13" 

AB = 

. 2 6 

0 _ 


0 

-1 

3 

1 

= 

_ 8 

-4 

26 

12. 





2 

7 

5 

2 







It follows from Formula (6) and Theorem 1.3.1 that the / th column vector of AB can be 
expressed as a linear combination of the column vectors of A in which the coefficients 
in the linear combination are the entries from the jth column of B. The computations 
are as follows: 


12 

= 4 

1 

+ 0 

2 

+ 2 

4 

8 


2 

6 

0 

27 


1 


2 

+ 7 

4 

-4 


2 


6 

0 

30 

= 4 

1 

+ 3 

2 

+ 5 

4 

26 


2 

6 

0 

13 

= 3 

1 

+ 

2 

+ 2 

4 

12 


2 


6 

0 


Partitioning provides yet another way to view matrix multiplication. Specifically, sup- 
pose that an m x r matrix A is partitioned into its r column vectors Ci , c 2 , . . . , c r (each 
of size m x 1) and an r x n matrix B is partitioned into its r row vectors r | . r 2 , . . . , r r 
(each of size 1 x n). Each term in the sum 


ciri + c 2 r 2 H b c,r f . 

has size m x n so the sum itself is an m x n matrix. We leave it as an exercise for you to 
verify that the entry in row i and column j of the sum is given by the expression on the 
right side of Formula (5), from which it follows that 


AB — ciri + c 2 r 2 H b c, r,. 


(ID 


We call (11) the column-row expansion of AB . 


► EXAMPLE 10 Column-Row Expansion 

Find the column-row expansion of the product 

2 0 4’ 

-3 5 1 

Solution The column vectors of A and the row vectors of B are, respectively, 



(12) 


Cl 



C 2 


ri = [2 0 4], r 2 = [—3 5 l] 
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Thus, it follows from (11) that the column-row expansion of AB is 


AB 



0 4] + 



1] 


2 

0 

4 

+ 

"-9 

15 

3" 

4 

0 

8 


3 

-5 

-1 


(13) 


The main use of the column- 
row expansion is for develop- 
ing theoretical results rather 
than for numerical computa- 
tions. 


As a check, we leave it for you to confirm that the product in (12) and the sum in (13) 
both yield 


AB 


-7 15 

7 -5 


7 

7 


◄ 


Matrix Form of a Linear Matrix multiplication has an important application to systems of linear equations. Con- 
System sider a system of m linear equations in n unknowns: 


<211*1 

+ 

012*2 

+ •■ 

■• + 

01 «*« — b\ 

021*1 

+ 

022*2 

+ •■ 

■• + 

02 n*n — ^2 

0m 1*1 

+ 

0/«2*2 

+ •■ 

■• + 

0/nn*n = b m 


Since two matrices are equal if and only if their corresponding entries are equal, we can 
replace the m equations in this system by the single matrix equation 


011*1 + Oi2*2 + • • • + 0 1„*„ 


b\ 

021*1 + 022*2 + ' ‘ ‘ + 02n*n 

= 

bi 

_ O m i*i + Ct m 2*2 + ■ ■ ■ + O mn * n _ 


_b m _ 


The m x 1 matrix on the left side of this equation can be written as a product to give 


o 11 012 

021 «22 

01 n 

02 n 


*i 

*2 


1 

-cf • • 

1 

_ Owl O m 2 

U-mn _ 


_*«_ 


bm_ 


If we designate these matrices by A, x, and b, respectively, then we can replace the original 
system of m equations in n unknowns by the single matrix equation 

Ax = b 


The vertical partition line in 
the augmented matrix [A | b] 
is optional, but is a useful way 
of visually separating the coef- 
ficient matrix A from the col- 
umn vector b. 


The matrix A in this equation is called the coefficient matrix of the system. The aug- 
mented matrix for the system is obtained by adjoining b to A as the last column; thus 
the augmented matrix is 


Oil 

012 

01n 

b\ 

021 

022 

02 n 

b 2 

_ Oml 

Om2 

&mn 

b/n_ 


Transpose of a Matrix We conclude this section by defining two matrix operations that have no analogs in the 
arithmetic of real numbers. 
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DEFINITION 7 If A is any m x n matrix, then the transpose of A, denoted by A 1 , is 
defined to be the n x m matrix that results by interchanging the rows and columns 
of A; that is, the first column of A T is the first row of A, the second column of A T is 
the second row of A, and so forth. 


EXAMPLE 11 Some Transposes 

The following are some examples of matrices and their transposes. 



0n 

« 12 

013 

014 

A = 

021 

022 

023 

024 


031 

0 32 

033 

034 


an 

021 

Cl 31 


A t = 

«I2 

Cl 22 

Cl 32 

, B 


013 

023 

033 



«14 

024 

a 34 



3 

4 
6 

5" 

6 


C = [1 3 5], D = [4] 


C T 


1 

3 

5 


D t = [4] ◄ 


Observe that not only are the columns of A T the rows of A, but the rows of A T are 
the columns of A. Thus the entry in row i and column j of A T is the entry in row j and 
column i of A; that is, 

(A 7 );/ = (A) ji (14) 

Note the reversal of the subscripts. 

In the special case where A is a square matrix, the transpose of A can be obtained 
by interchanging entries that are symmetrically positioned about the main diagonal. In 
(15) we see that A J can also be obtained by “reflecting” A about its main diagonal. 


1 

3 

-2 

7 

O -J^ 
1 


© 

-> A T = 

1 

-2 

3 

7 

1 

00 cyi 

1 

-5 

8 

6 


Q) ® '"6.. 


4 

0 

6 


Interchange entries that are 
symmetrically positioned 
about the main diagonal. 



James Sylvester Arthur Cayley 

(1814-1897) (1821-1895) 


Historical Note The term matrix was first used by the English mathematician 
James Sylvester, who defined the term in 1850 to be an "oblong arrangement 
of terms." Sylvester communicated his work on matrices to a fellow English 
mathematician and lawyer named Arthur Cayley, who then introduced some of 
the basic operations on matrices in a book entitled Memoir on the Theory of 
Matrices that was published in 1858. As a matter of interest, Sylvester, who was 
Jewish, did not get his college degree because he refused to sign a required 
oath to the Church of England. He was appointed to a chair at the University of 
Virginia in the United States but resigned after swatting a student with a stick 
because he was reading a newspaper in class. Sylvester, thinking he had killed 
the student, fled back to England on the first available ship. Fortunately, the 
student was not dead, just in shock! 

[ Images : © Bettmann/CORBIS ( Sylvester ); 

Photo Researchers/Getty Images (Cayley)] 
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Trace of a Matrix 


DEFINITIOM 8 If A is a square matrix, then the trace of A, denoted by tr( A), is defined 
to be the sum of the entries on the main diagonal of A. The trace of A is undefined 
if A is not a square matrix. 


► EXAMPLE 12 Trace 

The following are examples of matrices and their traces. 


_ 




'-1 

2 

7 

0 " 

an 

@ 12 

@u 

, B = 

3 

5 

-8 

4 

@21 

a 22 

a 23 

1 

2 

7 

-3 


@ 3 \ 

a 32 

@33 




1 

0 





4 

-2 


tr(A) — an T a 22 Trr 33 tr(£) — — 1 T 5 4- 7 T 0 — 11 

In the exercises you will have some practice working with the transpose and trace 
operations. 


Exercise Set 1.3 


In Exercises 1 

-2, suppose that A, 6, C, D , and E are matrices 

4. (a) 

2A r + C 

(b) 

D T -E T 

(c) 

( D-Ef 

with the following sizes: 

A B C D 

E 

(d) 

B t + 5 C T 

(e) 

l 2C T ~\A 

(f) 

B - B t 

(4x5) 

(4 x 5) (5 x 2) (4 x 2) (5 x 4) 

(g) 

2 E T - 3 D T 

(h) 

(2 E T - 3 D T ) T 

(0 

(CD)E 

In each part, determine whether the given matrix expression is 
defined. For those that are defined, give the size of the resulting 

(j) 

C(BA) 

(k) 

tr (DE t ) 

(1) 

tr (BC) 

matrix. 



5. (a) 

AB 

(b) 

BA 

(c) 

(3 E)D 

1. (a) BA 

(b) AB t 

(c) AC + D 

(d) 

(AB)C 

(e) 

A(BC) 

(f) 

CC T 

(d) E(AC) 

(e) A - 3 E T 

(f) £(56 + A) 

(g) 

{DA) T 

(hi 

(C t B)A t 

(i) 

tr (DD t ) 

2. (a) CD T 

(b) DC 

(c) BC - 2D 

(j) 

tr(4 E T - D) 

(k) 

tr {C T A T + 2 E T ) (1) 

tr ((EC t ) t A) 

(d) D t (BE) 

(e) B T D + ED 

(f) BA T + D 








In Exercises 5-6, use the following matrices to compute the 
indicated expression if it is defined. 



3 

o' 

r 

A = 

-1 

2 

6 = 


1 

1 

- 


4 -1 
0 2 


C = 


6. (a) (2 D t - E)A 
(c) (—AC) t + 5D T 
(e) B T (CC T - A r A) 


(b) (4 B)C + 2 B 
(d) (BA T - 2 C) T 
(f) D t E t - ( ED) t 


In Exercises 7-8, use the following matrices and either the row 
method or the column method, as appropriate, to find the indi- 



'15 2' 


ON 

cated row or column. 



D 

-1 0 1 

, E = 

-1 1 2 


'3 -2 1 


'6 -2 4' 


3 2 4 


4 1 3 

A = 

6 5 4 

and 6 = 

0 1 3 

v (a) D + E 

(b) D - E 

(c) 5A 

0 4 9 


7 7 5 


(d) —1C (e) 26 — C (f) 4 E — 2D 7. la) the first row of AB (b) the third row of AB 

(g) —3(D + 2E) (h) A — A (i) tr(D) (c) the second column of AB (d) the first column of BA 

(j) tr(D — 3E) (k) 4 tr(75) (1) tr(A) (e) the third row of AA (f) the third column of AA 
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8. (a) the first column of AB 
(c) the second row of BB 
(e) the third column of AB 


(b) the third column of BB 
(d) the first column of AA 
(f ) the first row of BA 


In Exercises 0, use matrices A and B from Exercises 7-8. 


In Exercises 15-16, find all values of k, if any, that satisfy the 
equation. 


15. [k 


‘1 1 o' 


k 

1 0 2 


1 

0 2-3 


1 


9. (a) Express each column vector of A A as a linear combination 
of the column vectors of A. 

(b) Express each column vector of BB as a linear combination 
of the column vectors of B. 


16. [2 2 


'l 2 O' 


'2 

2 0 3 


2 

0 3 1 


1 

1 


In Exercises 17-20, use the column-row expansion of AS to 
express this product as a sum of matrices. 


10. (a) Express each column vector of AS as a linear combination 

of the column vectors of A. 

(b) Express each column vector of BA as a linear combination 
of the column vectors of S. 

In each part of Exercises 1 , find matrices A, x, and b that 

express the given linear system as a single matrix equation Ax = b, 
and write out this matrix equation. 

11. (a) 2x\ — 3x 2 + 5x 3 = 7 

9xi — X 2 + x 3 = — 1 
X\ -f- 5x2 “b 4 x 3 = 0 

(b) 4xi — 3x 3 + x 4 = 1 
5x! + x 2 — 8x 4 = 3 
2 xi — 5x2 + 9x 3 — X 4 = 0 
3x2 — x 3 + 7 .X 4 = 2 

12. (a) xi — 2x 2 + 3x 3 = —3 (b) 3xi + 3x 2 + 3x 3 = —3 

2xi + X2 =0 — Xi — 5 x 2 — 2x 3 = 3 

— 3x2 + 4 x 3 = 1 — 4x2 + x 3 = 0 

X\ 4“ x 3 = 5 



’4 

- 3 ’ 





0 

1 

2 


17. A = 

2 

-1 

’ 

B 

— 


-2 

3 

1 



'0 

-2 





1 

4 

l " 


18. A = 

4 

-3 

* 


= 


-3 

0 

2 









'1 

2 




'l 

2 

3 " 








19. A = 

4 

5 

6 

’ 

B = 


3 

4 










_5 

6 _ 











'2 


r 


'0 

4 


2 







20. A = 

1 

-2 


5 

> 

B 

— 

4 


0 









1 

— 

1 


21. For the linear system in Example 5 of Section 1.2, express the 
general solution that we obtained in that example as a linear 
combination of column vectors that contain only numerical 
entries. [Suggestion: Rewrite the general solution as a single 
column vector, then write that column vector as a sum of col- 
umn vectors each of which contains at most one parameter, 
and then factor out the parameters.] 


In each part of Exercises 14, express the matrix equation 
as a system of linear equations. 


22. Follow the directions of Exercise 21 for the linear system in 
Example 6 of Section 1 .2. 



5 

6 

-7" 


Xi 


' 2 ' 

13. (a) 

-1 

-2 

3 


x 2 

= 

0 


0 

4 

-1 


x 3 


3 



'l 

1 

l' 


X 


2 " 

(b) 

2 

3 

0 


y 

= 

2 


5 

-3 

-6 


z 


-9 



3 

-1 

2 


Xi 


2 

14. (a) 

4 

3 

7 


x 2 

= 

-1 


-2 

1 

5 


x 3 


4 


"3-2 0 1" 


W 


'o' 

5 0 2 -2 


X 


0 

3 14 7 


y 


0 

1 

SO 

m 

(N 

1 

1 


_z 


_o_ 


In Exercises 23-24, solve the matrix equation for a , b, c, 
and d. 


' a 3 


'4 d — 2c 

— 1 a + b 


d -|- 2c — 2 


a — b b + a 1 [8 1 

24. 

3d + c 2d — cj |_7 6 

25. (a) Show that if A has a row of zeros and B is any matrix for 

which AB is defined, then AB also has a row of zeros. 

(b) Find a similar result involving a column of zeros. 

26. In each part, find a 6 x 6 matrix [ay] that satisfies the stated 
condition. Make your answers as general as possible by using 
letters rather than specific numbers for the nonzero entries. 

(a) otj = 0 if i ^ j (b) ay = 0 if i > j 

(c) ay = 0 if i < j (d) ay =0 if \i — j\ > 1 
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In Exercises 27-28, how many 3x3 matrices A can you find 
for which the equation is satisfied for all choices of x, y, and z? 



X 


x + y 


X 


xy 

27. A 

y 

= 

x-y 

28. A 

y 

= 

0 


z 


0 


z 


0 


29. A matrix B is said to be a square root of a matrix A if BB = A . 

\2 2 

(a) Find two square roots of A = ^ ^ . 

(b) How many different square roots can you find of 

rs oi 

A = ? 

_° 9 . 

(c) Do you think that every 2x2 matrix has at least one 
square root? Explain your reasoning. 

30. Let 0 denote a 2 x 2 matrix, each of whose entries is zero. 

(a) Is there a 2 x 2 matrix A such that A ^ 0 and A A =02 
Justify your answer. 

(b) Is there a 2 x 2 matrix A such that A ^ 0 and AA = A? 
Justify your answer. 

31. Establish Formula (1 1) by using Formula (5) to show that 

(AB)ij = f c, r ! + c 2 r 2 1 h c r r r ) y 


32. Find a 4 x 4 matrix A = [fly] whose entries satisfy the stated 
condition. 


(a) ciij = i + j 


(C) fly 


i ' | i 

if | i 


(b) fly = V 1 

j I > 1 
j I < 1 


34. The accompanying table shows a record of May and June unit 

sales for a clothing store. Let M denote the 4x3 matrix of 

May sales and J the 4x3 matrix of June sales. 

(a) What does the matrix M + J represent? 

(b) What does the matrix M — J represent? 

(c) Find a column vector x for which Mx provides a list of the 
number of shirts, jeans, suits, and raincoats sold in May. 

(d) Find a row vector y for which y M provides a list of the 
number of small, medium, and large items sold in May. 

(e) Using the matrices x and y that you found in parts (c) and 

(d), what does y Mx represent? 

Table Ex-34 

May Sales 



Small 

Medium 

Large 

Shirts 

45 

60 

75 

Jeans 

30 

30 

40 

Suits 

12 

65 

45 

Raincoats 

15 

40 

35 


June Sales 



Small 

Medium 

Large 

Shirts 

30 

33 

40 

Jeans 

21 

23 

25 

Suits 

9 

12 

11 

Raincoats 

8 

10 

9 


33. Suppose that type I items cost $1 each, type II items cost $2 
each, and type III items cost $3 each. Also, suppose that the 
accompanying table describes the number of items of each 
type purchased during the first four months of the year. 


Working with Proofs 

35. Prove: If A and B are n x n matrices, then 
tr(A + B) = tr (A) + tr(fl) 


Table Ex-33 



Type I 

Type II 

Type III 

Jan. 

3 

4 

3 

Feb. 

5 

6 

0 

Mar. 

2 

9 

4 

Apr. 

1 

1 

7 


What information is represented by the following product? 


36. (a) Prove: If AB and BA are both defined, then AB and BA 
are square matrices. 

(b) Prove: If A is an m x n matrix and A (BA) is defined, then 
B is an n x m matrix. 


True-False Exercises 

TF. In parts (a)-(o) determine whether the statement is true or 
false, and justify your answer. 


(a) The matrix 


1 

4 


2 

5 


3 

6 


has no main diagonal. 


3 4 
5 6 
2 9 

1 1 



(b) An m x n matrix has m column vectors and n row vectors. 

(c) If A and B are 2x2 matrices, then AB = BA. 

(d) The ith row vector of a matrix product AB can be computed 
by multiplying A by the ith row vector of B. 
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(e) For every matrix A, it is true that ( A T ) T = A. 

(f ) If A and B are square matrices of the same order, then 

tr(Afi) = tr(A)tr(5) 

(g) If A and B are square matrices of the same order, then 

(AB) t = A t B t 

(h) For every square matrix A, it is true that tr(A r ) = tr(A). 

(i) If A is a 6 x 4 matrix and B is an m x n matrix such that B T A T 
is a 2 x 6 matrix, then m = 4 and n = 2. 

(j) If A is an n x n matrix and c is a scalar, then tr(cA) = c tr(A). 

(k) If A, B, and C are matrices of the same size such that 
A — C = B — C , then A = B. 


Working with Technology 

Tl. (a) Compute the product AB of the matrices in Example 5, 
and compare your answer to that in the text. 

(b) Use your technology utility to extract the columns of A 
and the rows of B, and then calculate the product AB by 
a column-row expansion. 

T2. Suppose that a manufacturer uses Type I items at $1.35 each. 
Type II items at $2.15 each, and Type III items at $3.95 each. Sup- 
pose also that the accompanying table describes the purchases of 
those items (in thousands of units) for the first quarter of the year. 
Write down a matrix product, the computation of which produces 
a matrix that lists the manufacturer’s expenditure in each month 
of the first quarter. Compute that product. 


(l) If A, S, and C are square matrices of the same order such that 
AC — BC , then A = B. 

(m) If AB + BA is defined, then A and B are square matrices of 
the same size. 

(n) If B has a column of zeros, then so does AB if this product is 
defined. 

(o) If B has a column of zeros, then so does BA if this product is 
defined. 



Type I 

Type II 

Type III 

Jan. 

3.1 

4.2 

3.5 

Feb. 

5.1 

6.8 

0 

Mar. 

2.2 

9.5 

4.0 

Apr. 

1.0 

1.0 

7.4 


1.4 Inverses; Algebraic Properties of Matrices 

In this section we will discuss some of the algebraic properties of matrix operations. We will 
see that many of the basic rules of arithmetic for real numbers hold for matrices, but we will 
also see that some do not. 


Properties of Matrix 
Addition and Scalar 
Multiplication 


The following theorem lists the basic algebraic properties of the matrix operations. 

Properties of Matrix Arithmetic 

Assuming that the sizes of the matrices are such that the indicated operations can be 
performed, the following rules of matrix arithmetic are valid. 


(a) 

A + B = B + A 

[Commutative law for matrix addition] 

(b) 

A + (B + C) = (A + B) + C 

[Associative law for matrix addition] 

(c) 

A(BC) = ( AB)C 

[Associative law for matrix multiplicati 

(d) 

A(B + C) = AB + AC 

[Left distributive law] 

(e) 

(B + C)A = BA + CA 

[Right distributive law] 

(/) 

A(B - C) = AB - AC 


(*) 

(B - C)A = BA-CA 


(h) 

a(B + C) — aB + aC 


(0 

a(B — C) = aB — aC 


(j) 

(a + b)C = aC + bC 


(k) 

(a — b)C = aC — bC 


(l) 

a{bC) = (ab)C 


(m) 

a(BC) = (aB)C = B{aC) 
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To prove any of the equalities in this theorem we must show that the matrix on the left 
side has the same size as that on the right and that the corresponding entries on the two 
sides are the same. Most of the proofs follow the same pattern, so we will prove part 
(d ) as a sample. The proof of the associative law for multiplication is more complicated 
than the rest and is outlined in the exercises. 


There are three basic ways 
to prove that two matrices 
of the same size are equal — 
prove that corresponding en- 
tries are the same, prove that 
corresponding row vectors are 
the same, or prove that corre- 
sponding column vectors are 
the same. 


Proof (cf) We must show that A(B + C) and AB + AC have the same size and that 
corresponding entries are equal. To form A(B + C), the matrices B and C must have 
the same size, say m x n, and the matrix A must then have m columns, so its size must 
be of the form r x m. This makes A(B + C) an r x n matrix. It follows that AB + AC 
is also anrxn matrix and, consequently, A(B + C) and AB + AC have the same size. 

Suppose that A = [a i; ], B = [fey], and C = [Cy]. We want to show that correspond- 
ing entries of A(B + C) and AB + AC are equal; that is, 

{A(B + C)).. = {AB + AC)ij 

for all values of i and j . But from the definitions of matrix addition and matrix multi- 
plication, we have 

( A(B + C)) = di\{b\j + C\j ) + di2(b2j + C2j ) + ■ • • + dimibinj + C m j) 

= (dnbij + a^bij + ■ ■ ■ + a\ m b m j ) + (a,iCij + a i2 C2j + ■ ■ ■ + di m c m j ) 

= (AB)ij + (AC)ij = (AB + AC)ij 


Remark Although the operations of matrix addition and matrix multiplication were defined for 
pairs of matrices, associative laws ( b ) and (c) enable us to denote sums and products of three 
matrices as A + B + C and ABC without inserting any parentheses. This is justified by the fact 
that no matter how parentheses are inserted, the associative laws guarantee that the same end 
result will be obtained. In general, given any sum or any product of matrices, pairs of parentheses 
can he inserted or deleted anywhere within the expression without affecting the end result. 


^ EXAMPLE 1 Associativity of Matrix Multiplication 

As an illustration of the associative law for matrix multiplication, consider 


A = 


1 

3 

0 


2 

4 

1 




O' 

3 


Then 

AB 


"l 

2 




" 8 

5" 




4 

3' 




3 

4 



1 

= 

20 

13 

0 

1 


z 


2 

1 


and 


'4 

3' 

'1 O' 


'10 

9' 

2 

1_ 

2 3_ 


_ 4 

3_ 


Thus 


and 



8 

5 

"1 

9 

o' 


18 

15 

(. AB)C = 

20 

13 

= 

46 

39 


2 

1 

z 

j 


4 

3 

A(BC ) = 

"l 

3 

2 

4 


TO 

A 

9' 

a 


18 

46 

15" 

39 


0 

1 



j 


4 

3 


so ( AB)C = A(BC), as guaranteed by Theorem 1.4.1(c). 


1.4 Inverses; Algebraic Properties of Matrices 41 


Properties of Matrix 
Multiplication 


Do not read too much into Ex- 
ample 2 — it does not rule out 
the possibility that AB and BA 
may be equal in certain cases, 
just that they are not equal in 
all cases. If it so happens that 
AB = BA, then we say that 
AB and BA commute. 


Zero Matrices 


Do not let Theorem 1.4.1 lull you into believing that all laws of real arithmetic carry over 
to matrix arithmetic. For example, you know that in real arithmetic it is always true that 
ab = ba, which is called the commutative law for multiplication. In matrix arithmetic, 
however, the equality of AB and BA can fail for three possible reasons: 

AB may be defined and BA may not (for example, if A is 2 x 3 and B is 3 x 4). 

2. AB and BA may both be defined, but they may have different sizes (for example, if 
A is 2 x 3 and B is 3 x 2). 

3. AB and BA may both be defined and have the same size, but the two products may 
be different (as illustrated in the next example). 


► EXAMPLE 2 Order Matters in Matrix Multiplication 

Consider the matrices 



'-1 O' 


'1 2 ' 

A = 

2 3 

and B — 

3 0 


Multiplying gives 



'-1 - 2 ' 


' 3 

6 ' 

AB = 

11 4 

and BA = 

-3 

0 


Thus, AB 7 ^ BA. ◄ 


A matrix whose entries are all zero is called a zero matrix. Some examples are 


'0 O' 
0 0 




~ 0 ~ 

”o 0 0" 





0 0 0 0 


0 

0 0 0 







_0 0 0 0_ 


0 

0 0 0 







0 


We will denote a zero matrix by 0 unless it is important to specify its size, in which case 
we will denote the m x n zero matrix by 0 mxn . 

It should be evident that if A and 0 are matrices with the same size, then 


A+0=0+A=A 


Thus, 0 plays the same role in this matrix equation that the number 0 plays in the 
numerical equation a+0 = 0 + a — a. 

The following theorem lists the basic properties of zero matrices. Since the results 
should be self-evident, we will omit the formal proofs. 


Properties of Zero Matrices 

If c is a scalar, and if the sizes of the matrices are such that the operations can be 
perfomed , them 

(a) A + 0 = 0+A = A 

(b) A — 0 — A 

(c) A — A = A + (—A) = 0 

(d) 0A = 0 

(e) If cA = 0 , then c — 0 or A = 0. 
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Identity Matrices 


Since we know that the commutative law of real arithmetic is not valid in matrix 
arithmetic, it should not be surprising that there are other rules that fail as well. For 
example, consider the following two laws of real arithmetic: 

If ab — ac and a -A 0. then b — C. |The cancellation law| 

If ab — 0, then at least one of the factors on the left is 0. 

The next two examples show that these laws are not true in matrix arithmetic. 


► EXAMPLE 3 Failure of the Cancellation Law 

Consider the matrices 



"o l" 


1 l’ 


_ 2 

5 

A = 

0 2 

, B = 

3 4 

, c = 

3 

4 


We leave it for you to confirm that 


AB = AC = 


3 

6 


4 

8 


Although A ^ 0, canceling A from both sides of the equation AB = AC would lead 
to the incorrect conclusion that B = C. Thus, the cancellation law does not hold, in 
general, for matrix multiplication (though there may be particular cases where it is true). 


► EXAMPLE 4 A Zero Product with Nonzero Factors 

Here are two matrices for which AB = 0, but A ^ 0 and B ^ 0: 


A = 


0 

0 


1 

2 



7 

0 


◄ 


A square matrix with l’s on the main diagonal and zeros elsewhere is called an identity 
matrix. Some examples are 


1 

0 1 


1 0 0 
0 1 0 
0 0 1 


10 0 0 
0 10 0 
0 0 10 
0 0 0 1 


An identity matrix is denoted by the letter I . If it is important to emphasize the size, we 
will write I„ for the n x n identity matrix. 

To explain the role of identity matrices in matrix arithmetic, let us consider the effect 
of multiplying a general 2x3 matrix A on each side by an identity matrix. Multiplying 
on the right by the 3x3 identity matrix yields 







1 

0 

0 





A/3 = 

On 

«12 

fll3 


0 

1 

0 

— 

«ii 

a \2 

fll3 


_fl21 

<222 

«23_ 


0 

0 

1 


_fl21 

a 22 

<223. 


and multiplying on the left by the 2x2 identity matrix yields 


"l 

0 

an 

a \2 

fl!3 


All 

<212 

<213 

0 

1 

_fl21 

a 22 

<223. 


.<221 

<222 

<223. 


hA = 
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Inverse of a Matrix 


The same result holds in general; that is, if A is any m x n matrix, then 
AI n = A and I m A = A 

Thus, the identity matrices play the same role in matrix arithmetic that the number 1 
plays in the numerical equation a • 1 = 1 • a = a. 

As the next theorem shows, identity matrices arise naturally in studying reduced row 
echelon forms of square matrices. 


IfR is the reduced row echelon form of an n x n matrix A, then either 
R has a row of zeros or R is the identity matrix /„. 


Proof Suppose that the reduced row echelon form of A is 

r\2 ■■■ r\n 

r 22 ■■■ fin 

Ai 2 * ‘ ' Am 

Either the last row in this matrix consists entirely of zeros or it does not. If not, the 
matrix contains no zero rows, and consequently each of the n rows has a leading entry 
of 1. Since these leading l’s occur progressively farther to the right as we move down 
the matrix, each of these l’s must occur on the main diagonal. Since the other entries in 
the same column as one of these l’s are zero, R must be /„ . Thus, either R has a row of 
zeros or R = I„ . 


n l 



Ail 


In real arithmetic every nonzero number a has a reciprocal a -1 (= 1/a) with the property 

a ■ a” 1 = a -1 • a = 1 

The number a -1 is sometimes called the multiplicative inverse of a. Our next objective is 
to develop an analog of this result for matrix arithmetic. For this purpose we make the 
following definition. 


DEFINITION 1 If A is a square matrix, and if a matrix B of the same size can be 
found such that AB = BA — 7, then A is said to be invertible (or nonsingular) and 
B is called an inverse of A. If no such matrix B can be found, then A is said to be 
singular. 


The relationship AB = BA = I is not changed by interchanging A and B, so if A is 
invertible and B is an inverse of A, then it is also true that B is invertible, and A is an inverse of 
B. Thus, when 

AB = BA = / 

we say that A and B are inverses of one another. 


► EXAMPLE 5 An Invertible Matrix 

Let 

3 5 ' 

1 2 


A = 


and B = 


44 
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As in Example 6, we will fre- 
quently denote a zero matrix 
with one row or one column 
by a boldface zero. 


Properties of Inverses 


WARNING The symbol A 1 
should not be interpreted as 
1 / A . Division by matrices will 
not be a defined operation in 
this text. 


Then 


AB = 


BA = 


2 

-5' 

'3 

5' 


'1 

O' 

-1 

3_ 

1 

2 


_0 

1 _ 


'3 

5' 

2 

-5' 


'1 

O' 

_1 

2_ 

-1 

3_ 


_0 

1_ 


I 


I 


Thus, A and B are invertible and each is an inverse of the other. 


► EXAMPLE 6 A Class of Singular Matrices 


A square matrix with a row or column of zeros is singular. To help understand why this 
is so, consider the matrix 


A = 


1 4 

2 5 

3 6 


0 

0 

0 


To prove that A is singular we must show that there is no 3 x 3 matrix B such that 
AB = BA — /. For this purpose let Ci, C2, 0 be the column vectors of A. Thus, for any 
3x3 matrix B we can express the product BA as 


BA = / >' j C ; C 2 0] = | / > C / )' C ■ 0] [Formula (6) of Section 1.3| 

The column of zeros shows that BA ^ I and hence that A is singular. 


It is reasonable to ask whether an invertible matrix can have more than one inverse. The 
next theorem shows that the answer is no — an invertible matrix has exactly one inverse. 


If B and C are both inverses of the matrix A, then B = C. 


Proof Since B is an inverse of A, we have BA — I. Multiplying both sides on the right 
by C gives ( BA)C = IC = C. But it is also true that ( BA)C = B(AC) = BI — B, so 
C = B. 

As a consequence of this important result, we can now speak of “the” inverse of an 
invertible matrix. If A is invertible, then its inverse will be denoted by the symbol A” 1 . 
Thus, 

AA~‘ = I and A _1 A = I (1) 

The inverse of A plays much the same role in matrix arithmetic that the reciprocal a -1 
plays in the numerical relationships aa~ l — 1 and a~ l a = 1. 

In the next section we will develop a method for computing the inverse of an invertible 
matrix of any size. For now we give the following theorem that specifies conditions under 
which a 2 x 2 matrix is invertible and provides a simple formula for its inverse. 


Historical Note The formula for A -1 given inTheorem 1.4.5 first appeared (in a more general form) 
in Arthur Cayley's 1858 Memoir on the Theory of Matrices. The more general result that Cayley 
discovered will be studied later. 
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The quantity ad — be in The- 
orem 1.4.5 is called the deter- 
minant of the 2x2 matrix A 
and is denoted by 

det(A) = ad — be 
or alternatively by 


We will omit the proof, because we will study a more general version of this theorem 
later. For now, you should at least confirm the validity of Formula (2) by showing that 
AA" 1 = A -1 A = /. 


a b 
c d 


= ad — be 


THEOREM 1.4.5 The matrix 


A = 


a b 
c d 


is invertible if and only if ad — be 0, in which case the inverse is given by the formula 


A' 1 = 


1 


ad — be 


d —b 
-c a 


( 2 ) 


det(A) 



▲ Figure 1.4.1 


= ad - be 


Remark Figure 1.4.1 illustrates that the determinant of a 2 x 2 matrix A is the product of the 
entries on its main diagonal minus the product of the entries off its main diagonal. 


> EXAMPLE 7 Calculating the Inverse of a 2 x 2 Matrix 

In each part, determine whether the matrix is invertible. If so, find its inverse. 


(a) A = 


1 ' 

2 


(b) A = 


2 ' 

-6 


Solution (a) The determinant of A is det(A) = (6) (2) — ( 1 ) (5) = 7, which is nonzero. 
Thus, A is invertible, and its inverse is 


1 

2 

-r 


" 2 

7 

1 “ 

7 

7 

5 

6_ 


5 

_ 7 

6 

7 _ 


We leave it for you to confirm that A A 1 = A 1 A = I. 


Solution (£>) The matrix is not invertible since det(A) = (—!)(— 6) — (2) (3) = 0. 


► EXAMPLE 8 Solution of a Linear System by Matrix Inversion 

A problem that arises in many applications is to solve a pair of equations of the form 

u — ax + by 
v = ex + dy 


for x and y in terms of u and v. One approach is to treat this as a linear system of 
two equations in the unknowns x and y and use Gauss-Jordan elimination to solve 
for x and y. However, because the coefficients of the unknowns are literal rather than 
numerical , this procedure is a little clumsy. As an alternative approach, let us replace the 
two equations by the single matrix equation 


a 


ax + by 

_v_ 


ex + cly 


which we can rewrite as 


a 


a b 


X 

_v_ 


_c d_ 


_y_ 


If we assume that the 2x2 matrix is invertible (i.e., ad — be ^ 0), then we can multiply 
through on the left by the inverse and rewrite the equation as 


a b 

— 1 

u 


a b 

— 1 

a b 


X 

c d 


_v_ 


c d 


c d 


_y_ 
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If a product of matrices is 
singular, then at least one of 
the factors must be singular. 
Why? 

A ’ 1 


Powers of a Matrix 


which simplifies to 



a b 

-1 

u 


X 


c d 


V 


_y_ 

Using Theorem 1.4.5, we can rewrite this equation as 


i 

d —b 


U 


X 

ad — be 

1 

Q 

1 


_V_ 


_y_ 


from which we obtain 


du — bv civ 

x = , y = — - 

ad — be " ad 

The next theorem is concerned with inverses of matrix products. 



THEOREM 1.4.6 If A and B are invertible matrices with the same size, then AB is 
invertible and 

( AB)~ l = B~ X A~ X 

Proof We can establish the invertibility and obtain the stated formula at the same time 
by showing that 

(AB)(B~ 1 A~ l ) = ( B~ l A~ l )(AB ) = I 
But 

(AfiMfi-'A” 1 ) = A(BB~ l )A~ l = A/A " 1 = AA " 1 = / 
and similarly, (B~ l A~ l )(AB) = I. 

Although we will not prove it, this result can be extended to three or more factors: 


A product of any number of invertible matrices is invertible, and the inverse of the product 
is the product of the inverses in the reverse order. 


► EXAMPLE 9 The Inverse of a Product 

Consider the matrices 



'1 2' 


'3 

2 

A = 

1 3 

, B = 

2 

2 


We leave it for you to show that 



'7 6 


4 - 3 " 

AB = 

9 8 _ 

, (AB)~ l = 

9 7 

2 2 


and also that 
" 3 -2 

lj ’ 

Thus, (AB)~ l 


l -f 


1 -l" 

’ 3 -2 


4 -3" 

.- 1 i 

, ZU 1 A” 1 = 

- 1 !. 

- ! 1 

— 

9 7 

2 2 _ 


— B 'A 1 as guaranteed by Theorem 1.4.6. 


If A is a square matrix, then we define the nonnegative integer powers of A to be 
A 0 = I and A" = A A ■ • • A [ n factors| 
and if A is invertible, then we define the negative integer powers of A to be 


A ” = (A *)” = A 'A 1 • ■ ■ A 1 |n factors) 
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Matrix Polynomials 


Because these definitions parallel those for real numbers, the usual laws of nonnegative 
exponents hold; for example, 

A r A s = A' +s and (AT = A rs 

In addition, we have the following properties of negative exponents. 


1EM 1.4.7 If A is invertible and n is a nonnegative integer , then : 
(i a ) A -1 is invertible and (A^ 1 )” 1 = A. 

(b) A" is invertible and (A") -1 = A~ n = (A -1 )". 

(c) kA is invertible for any nonzero scalar k , and (LA) -1 = k~ l A -1 . 


We will prove part (c) and leave the proofs of parts (a) and ( b ) as exercises. 
Proof (c) Properties (m) and (/) of Theorem 1.4.1 imply that 

(kA)(k~ x A~ l ) = k~ l (kA)A~ l = (L -1 /fc)AA -1 = (1)1 = I 
and similarly, (k~ l A~ l )(kA) = I. Thus, kA is invertible and (AA) -1 = k~ x A -1 . 


I EXAMPLE 10 Properties of Exponents 

Let A and A -1 be the matrices in Example 9; that is, 



'1 2' 


' 3 -2" 

A = 

1 3 

and A" 1 = 

-1 1 


Then 


A -3 = (A” 1 ) 3 


' 3 -2" 

' 3 -2" 

' 3 -2' 


' 41 

-30" 

! 1 

-1 1 

! 1 _ 


— !5 

11 _ 


Also, 


"1 2" 

"1 

2" 

"1 

2' 


"11 

30" 

_! 3_ 

_1 

3_ 

_1 

3_ 


_15 

4!_ 


so, as expected from Theorem 1.4.7 (b), 

41 -30" 

-15 11 


(A 3 )" 1 = 


1 


(H)(41) — (30) (15) 


41 -30" 

-15 11 


(A” 1 ) 3 


► EXAMPLE 11 The Square of a Matrix Sum 

In real arithmetic, where we have a commutative law for multiplication, we can write 

(i a + b) 2 = a 1 + ab + ba + b 2 = a 2 + ab + ab + b 2 — a 1 + lab + b 2 

However, in matrix arithmetic, where we have no commutative law for multiplication, 
the best we can do is to write 

(A + B) 2 = A 2 + AB + BA + B 2 

It is only in the special case where A and B commute (i.e., AB = BA) that we can go a 
step further and write 

(A + B) 2 = A 2 + 2 AB + B 2 ◄ 

If A is a square matrix, say n x n, and if 


p(x) = «o + ci\x + a-±x 2 + • • ■ + a m x m 
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Properties of the Transpose 


is any polynomial, then we define the n x n matrix p(A) to be 

p(A) = gqI 4~ ci\ A q.iA~ A - • • ■ 4~ a m A m (3) 

where I is the n x n identity matrix; that is, p(A) is obtained by substituting A for x 
and replacing the constant term ag by the matrix ciqI . An expression of form (3) is called 
a matrix polynomial in A . 


EXAMPLE 12 A Matrix Polynomial 

Find p(A) for 

p(x) — x 2 — 2x — 3 and A 


Solution 


p(A) = A 2 - 2A — 3/ 

-1 
0 




O' 

1 


"1 4 


'-2 4 


'3 O' 


o 

O 

1 

Os 

O 


1 

o 

1 


0 3 


O 

Q | 


or more briefly, p(A) — 0. 


It follows from the fact that A r A s = A r+S = A J+r = A s A r that powers of a square 
matrix commute, and since a matrix polynomial in A is built up from powers of A , any two matrix 
polynomials in A also commute; that is, for any polynomials p \ and pi we have 

Pi(A)p 2 (A) = p 2 (A)pi(A) (4) 


The following theorem lists the main properties of the transpose. 


.4.8 If the sizes of the matrices are such that the stated operations can be 
performed, them. 

(a) (A T ) T = A 

(. b ) (A 4- B) T = A t + B t 

(c) (A - B) t = A t - B t 

(d) ( kA) T = kA r 
(<?) (AB) t = B t A t 


If you keep in mind that transposing a matrix interchanges its rows and columns, then 
you should have little trouble visualizing the results in parts (a)-(d). For example, part 
(a) states the obvious fact that interchanging rows and columns twice leaves a matrix 
unchanged; and part ( b ) states that adding two matrices and then interchanging the 
rows and columns produces the same result as interchanging the rows and columns 
before adding. We will omit the formal proofs. Part (e) is less obvious, but for brevity 
we will omit its proof as well. The result in that part can be extended to three or more 
factors and restated as: 


The transpose of a product of any number of matrices is the product of the transposes 
in the reverse order. 
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The following theorem establishes a relationship between the inverse of a matrix and 
the inverse of its transpose. 


If A is an invertible matrix , then A r is also invertible and 

(a t )~ 1 = (A-y 


Proof We can establish the invertibility and obtain the formula at the same time by 
showing that 

A T (A~ l ) T = (A~ l ) T A T = I 

But from part (e) of Theorem 1.4.8 and the fact that I T — /, we have 
A T (A-y = (A~ l A) T = I T = I 
(A~ l ) T A T = ( AA~ l ) T = I T = I 

which completes the proof. 


EXAMPLE 13 Inverse of a Transpose 

Consider a general 2x2 invertible matrix and its transpose: 


A = 


a 

c 


b 

d 


and A t 


a c 
b d 


Since A is invertible, its determinant ad — be is nonzero. But the determinant of A T is 
also ad — be (verify), so A T is also invertible. It follows from Theorem 1.4.5 that 

d c - 

. a t \ i ad — be ad — be 

(A 1 ) = 

b a 

_ ad — be ad — be _ 

which is the same matrix that results if A” 1 is transposed (verify). Thus, 

(A r )-> = (A-y 

as guaranteed by Theorem 1.4.9. A 


Exercise Set 1.4 

In Exercises 1-2, verify that the following matrices and scalars 
satisfy the stated properties of Theorem 1.4.1. 


A = 


3 

2 



2 

-4 


C = 


4 

-3 



a = 4. b = —7 


1. (a) The associative law for matrix addition. 

(b) The associative law for matrix multiplication. 

(c) The left distributive law. 

(d) (a + b)C = aC + bC 


2. (a) a(BC ) = (aB)C = B(aC ) 

(b) A(B - C) = AB - AC (c) (B + C)A = BA + CA 
(d) a{bC) = (ab)C 

In Exercises 3-4, verify that the matrices and scalars in Exer- 
cise 1 satisfy the stated properties. 

3. (a) (A T ) T = A (b) (AB) t = B T A T 

4. (a) (A + B) t = A T + B T (b) (ciC) T = aC T 

In Exercises 5-8, use Theorem 1 .4.5 to compute the inverse of 
the matrix. 



f2 

co 

1 


1 

co 

II 

IT) 

1 

i 

6. B = 

_5 2_ 
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7. 



O' 

3 


8. D = 


6 

-2 


4' 

-1 


In Exercises 25-28, use the method of Example 8 to find the 
unique solution of the given linear system. 


9. Find the inverse of 

\(e x + e~ x ) \(e x — e~ x ) 

\(e x -e ~ x ) \(e x + e~ x ) 

10. Find the inverse of 

cos 9 sin 9 
— sin 9 cos 9 


25. 3xi — 2x2 = — 1 
4*1 + 5x 2 = 3 

27. 6xi + x 2 = 0 

4*1 — 3x 2 = —2 

If a polynomial p(x) can 
degree polynomials, say 


26. — xi + 5x2 = 4 
— X\ — 3X2 = 1 

28. 2xi — 2x2 = 4 
xi + 4x2 = 4 

3 factored as a product of lower 


In Exercises 4, verify that the equations are valid for the 
matrices in Exercises 5-8. 

11. (A 7 ’)” 1 = ( A 1 12. (A -1 ) -1 = A 

13. (ABC)-' = C-'B-'A-' 14. (ABC) 7 = C r B T A r 

In Exercises 18, use the given information to find A. 


p(x) = pi(x)p 2 (x) 

and if A is a square matrix, then it can be proved that 
p(A) = pi(A)p 2 (A) 

In Exercises 29-30, verify this statement for the stated matrix A 
and polynomials 

p(x) = x 2 — 9, pi(x) = x + 3, p 2 (x) = x — 3 


15. (7A)" 1 = 


17. (/ + 2A)-‘ = 

In Exercises 9-20, compute the following using the given ma- 
trix A. 

(a) A 3 (b) A- 1 (c) A 2 - 2A + / 



r 3 n 


'2 0" 

1° 

te- 

ll 

i 

K> 

l 

II 

© 

<N 

_4 1 


In Exercises 2 1-22, compute p(A) for the given matrix A and 
the following polynomials. 

(a) p(x) = x — 2 

(b) p(x) = lx 1 — x + 1 

(c) p(x) = x 3 — 2x + 1 



r 3 n 


'2 0" 

II 

rl 

i 

K> 

l 

II 

ri 

r* 

_4 1_ 


In Exercises 23-24, let 



a b 


"o f 


"o o" 

A = 

c d 

, B = 

0 0 

, c = 

1 0 


23. Find all values of a, b, c, and d (if any) for which the matrices 
A and B commute. 

24. Find all values of a, b, c, and d (if any) for which the matrices 
A and C commute. 


29. The matrix A in Exercise 21 . 

30. An arbitrary square matrix A. 

31. (a) Give an example of two 2x2 matrices such that 

(A + B)(A - B)^ A 1 - B 2 

(b) State a valid formula for multiplying out 

(A + B)(A — B) 

(c) What condition can you impose on A and B that will allow 
you to write (A + B)(A — B) = A 2 — fi 2 ? 

32. The numerical equation a 2 = 1 has exactly two solutions. 
Find at least eight solutions of the matrix equation A 2 = I 2 . 
[Hint: Look for solutions in which all entries off the main 
diagonal are zero.] 

33. (a) Show that if a square matrix A satisfies the equation 

A 2 + 2A + / = 0, then A must be invertible. What is the 
inverse? 

(b) Show that if p(x) is a polynomial with a nonzero constant 
term, and if A is a square matrix for which p(A) — 0, then 
A is invertible. 

34. Is it possible for A 3 to be an identity matrix without A being 
invertible? Explain. 

35. Can a matrix with a row of zeros or a column of zeros have an 
inverse? Explain. 

36. Can a matrix with two identical rows or two identical columns 
have an inverse? Explain. 


16. (5A r r‘ = 


18. A 1 = 
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In Exercises 37-38, determine whether A is invertible, and if 
so, find the inverse. [Hint: Solve AX = / for X by equating cor- 
responding entries on the two sides.] 



"i 

0 

l" 


"l 

1 

l" 

II 

i 

1 

0 

II 

OO 

l 

0 

0 


0 

1 

1 


0 

1 

1 


In Exercises 39-40, simplify the expression assuming that A , 
B, C, and D are invertible. 

39. (AB)-'(AC~ 1 )(D- l C^ 1 )- l D- 1 

40. (AC -1 ) -1 (AC -1 )(AC -1 ) -1 AD -1 

41. Show that if R is a 1 x n matrix and C is an n x 1 matrix, 
then RC = tr (CR). 


49. Assuming that all matrices are n x n and invertible, solve 
for D. 

C T B~ l A 2 BACK'D A~ 2 B T C~ 2 = C T 

50. Assuming that all matrices are n x n and invertible, solve 
for D. 

ABC t DBA t C = AB t 

Working with Proofs 

In Exercises 51-58, prove the stated result. 

51. Theorem 1.4.1(a) 52. Theorem 1.4. 1(6) 

53. Theorem 1 .4. 1 (/) 54. Theorem 1.4.1(c) 

55. Theorem 1.4.2(c) 56. Theorem 1.4. 2(b) 


42. If A is a square matrix and n is a positive integer, is it true that ^ xij eorem \ 4 8( c/) 
(A") t = (A r )"? Justify your answer. 


58. Theorem 1.4.8(e) 


43. (a) Show that if A is invertible and AB = AC, then B = C. 

(b) Explain why part (a) and Example 3 do not contradict one 
another. 

44. Show that if A is invertible and k is any nonzero scalar, then 
(kA) n = k n A" for all integer values of n. 

45. (a) Show that if A, 6, and A + B are invertible matrices with 

the same size, then 

A(A -1 + B-')B(A + B)~' = I 
(b) What does the result in part (a) tell you about the matrix 

A-i + g -i ? 

46. A square matrix A is said to be idempotent if A 2 = A. 

(a) Show that if A is idempotent, then so is / — A. 

(b) Show that if A is idempotent, then 2A — / is invertible 
and is its own inverse. 


True-False Exercises 

TF. In parts (a)-(k) determine whether the statement is true or 

false, and justify your answer. 

(a) Two n x n matrices, A and B , are inverses of one another if 
and only if AB = BA = 0. 

(b) For all square matrices A and B of the same size, it is true that 
(A + B) 2 = A 2 + 2AB + B 2 . 

(c) For all square matrices A and B of the same size, it is true that 
A 2 - B 2 = (A- B)(A + B). 

(d) If A and B are invertible matrices of the same size, then AB is 
invertible and (AS) -1 = A -1 S -1 . 

(e) If A and B are matrices such that AS is defined, then it is true 
that ( ABf = A T B T . 

(f ) The matrix 


47. Show that if A is a square matrix such that A k = 0 for some 
positive integer k, then the matrix / — A is invertible and 

(I - A) -1 = / + A + A 2 + -- - + A k -' 

48. Show that the matrix 


A = 


a 

c 


b 

d 


satisfies the equation 

A 2 — (a + d)A + (ad — bc)I = 0 


is invertible if and only if ad — be ^ 0. 

(g) If A and S are matrices of the same size and k is a constant, 
then (kA + B) T = kA T + B T . 

(h) If A is an invertible matrix, then so is A T . 

(i) If p(x) = a 0 + a\X + a 2 x 2 + • • • + a m x m and / is an identity 
matrix, then p(I) = a 0 + fli + a 2 + • • • + a m . 

(j) A square matrix containing a row or column of zeros cannot 
be invertible. 

(k) The sum of two invertible matrices of the same size must be 
invertible. 
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Working with Technology 


0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . 


Tl. Let A be the matrix 

"0 

1 

2 

1 “ 

3 

A = 

1 

4 

0 

1 

5 

^ ^ , lr 

1 

-6 

1 

7 

0_ 


Discuss the behavior of A k as k increases indefinitely, that is, as 
k— >oo. 


the terms of which are commonly denoted as 
To, F\, F 2 , Fi, , F n , . . . 

After the initial terms F 0 = 0 and F t = 1, each term is the sum of 
the previous two; that is, 

F„ = F„_, + 2 


T2. In each part use your technology utility to make a conjecture 
about the form of A" for positive integer powers of n. 


Confirm that if 


(a) A = 


a 

0 


1 

a 


(b) A = 


COS 8 

— sin 8 


sin 8 
cos 8 


then 


T3. The Fibonacci sequence (named for the Italian mathematician 
Leonardo Fibonacci 1170-1250) is 


>2 

F\ 


"l 

f 

/l 

Fo_ 


1 

0 



F n 

F 0 


1.5 Elementary Matrices and a Method for Finding A 1 

In this section we will develop an algorithm for finding the inverse of a matrix, and we will 
discuss some of the basic properties of invertible matrices. 

In Section 1.1 we defined three elementary row operations on a matrix A: 

Multiply a row by a nonzero constant c. 

2. Interchange two rows. 

3. Add a constant c times one row to another. 

It should be evident that if we let B be the matrix that results from A by performing one 
of the operations in this list, then the matrix A can be recovered from B by performing 
the corresponding operation in the following list: 

Multiply the same row by 1 /c. 

2. Interchange the same two rows. 

3. If B resulted by adding c times row r, of A to row rj, then add — c times rj to r, . 

It follows that if B is obtained from A by performing a sequence of elementary row 
operations, then there is a second sequence of elementary row operations, which when 
applied to B recovers A (Exercise 33). Accordingly, we make the following definition. 


DEFINITION 1 Matrices A and B are said to be row equivalent if either (hence each) 
can be obtained from the other by a sequence of elementary row operations. 


Our next goal is to show how matrix multiplication can be used to carry out an 
elementary row operation. 


DEFINITION 2 A matrix E is called an elementary matrix if it can be obtained from 
an identity matrix by performing a single elementary row operation. 
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► EXAMPLE 1 Elementary Matrices and Row Operations 

Listed below are four elementary matrices and the operations that produce them. 


1 0 

0 -3 


f 


1 

0 

0 

0 


0 0 
0 0 
0 1 
1 0 

t 


0 

1 

0 

0 


~1 

0 

3~ 


~1 

0 

o" 

0 

1 

0 


0 

1 

0 

0 

0 

1 


0 

0 

1 


t 1 


Multiply the 
second row of 
h by -3. 


Interchange the 
second and fourth 
rows of It , . 


Add 3 times 
the third row of 
/3 to the first row. 


Multiply the 
first row of 
h by 1. ^ 


The following theorem, whose proof is left as an exercise, shows that when a matrix A 
is multiplied on the left by an elementary matrix E, the effect is to perform an elementary 
row operation on A . 


Row Operations by Matrix Multiplication 

If the elementary matrix E results from performing a certain row operation on I,„ and 
if A is an m x n matrix, then the product EA is the matrix that results when this same 
row operation is performed on A. 


A EXAMPLE 2 Using Elementary Matrices 

Consider the matrix 


A = 


1 

2 

1 


0 

-1 

4 


2 

3 

4 


3 

6 

0 


and consider the elementary matrix 


E = 


1 0 
0 1 
3 0 


0 

0 

1 


Theorem 1.5.1 will be a use- 
ful tool for developing new re- 
sults about matrices, but as a 
practical matter it is usually 
preferable to perform row op- 
erations directly. 


which results from adding 3 times the first row of f to the third row. The product EA is 


EA = 


1 

2 

4 


0 

-1 

4 


2 3 

3 6 

10 9 


which is precisely the matrix that results when we add 3 times the first row of A to the 
third row. M 


We know from the discussion at the beginning of this section that if E is an elementary 
matrix that results from performing an elementary row operation on an identity matrix 
I, then there is a second elementary row operation, which when applied to E produces 
I back again. Table 1 lists these operations. The operations on the right side of the table 
are called the inverse operations of the corresponding operations on the left. 
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Equivalence Theorem 


Table 1 


Row Operation on I 

That Produces E 

Row Operation on E 

That Reproduces / 

Multiply row i by c 0 

Multiply row i by 1/c 

Interchange rows i and j 

Interchange rows i and j 

Add c time row i to row j 

Add — c times row i to row j 


► EXAMPLE 3 Row Operations and Inverse Row Operations 


In each of the following, an elementary row operation is applied to the 2 x 2 identity 
matrix to obtain an elementary matrix E. then E is restored to the identity matrix by 
applying the inverse row operation. 


'1 

o' 


'1 

o' 


T 

O' 

_0 

1 _ 


_0 

7_ 


_0 

1 _ 


t 


Multiply the second Multiply the second 
row by 7. row by i . 


'1 

o' 


'0 

r 


'1 

o' 

_0 

1 


1 

0_ 


_0 

1_ 


Interchange the first Interchange the first 
and second rows. and second rows. 


"1 

o' 


'1 

5' 


'1 

o' 

_0 

1 


_0 

1 


_0 

1 


Add 5 times the 
second row to the 
first. 


Add —5 times the 
second row to the 
first. 


◄ 


The next theorem is a key result about invertibility of elementary matrices. It will be 
a building block for many results that follow. 


Every elementary matrix is invertible , and the inverse is also an ele- 
mentary matrix. 


Proof If E is an elementary matrix, then E results by performing some row operation 
on I . Let Eq be the matrix that results when the inverse of this operation is performed 
on I . Applying Theorem 1.5.1 and using the fact that inverse row operations cancel the 
effect of each other, it follows that 

EqE = I and EE 0 — I 

Thus, the elementary matrix Eq is the inverse of E . 

One of our objectives as we progress through this text is to show how seemingly diverse 
ideas in linear algebra are related. The following theorem, which relates results we 
have obtained about invertibility of matrices, homogeneous linear systems, reduced row 
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echelon forms, and elementary matrices, is our first step in that direction. As we study 
new topics, more statements will be added to this theorem. 


Equivalent Statements 

If A is ann x n matrix, then the following statements are equivalent, that is, all true or 
all false. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

( d ) A is expressible as a product of elementary matrices. 


The following figure illustrates 
visually that from the se- 
quence of implications 

(a) => (b) =>- (c) =>■ ( d ) => (a) 

we can conclude that 


Proof We will prove the equivalence by establishing the chain of implications: 

(a) =y ( b ) (c) =y (d) => (a). 

(a) =y (b) Assume A is invertible and let xo be any solution of Ax = 0. Multiplying both 
sides of this equation by the matrix A -1 gives A~'(Axo) = A _1 0, or (A~'A)xo = 0, or 
7x 0 = 0, or Xo = 0. Thus, Ax = 0 has only the trivial solution. 


(d) =► ( C ) => (b) =► (a) 
and hence that 

(a) yy ( b ) <y (c) o (d) 
(see Appendix A). 



(f>) =y (c) Let Ax = 0 be the matrix form of the system 


^7 1 1 Cf i T fl 12 X2 “b • • • T Q\ n X n — 0 
fl2iXi + a 2 2 X 2 H b a 2n x„ = 0 


( 1 ) 


a„ ixi + a„ 2 X 2 H b a nn x„ = 0 


and assume that the system has only the trivial solution. If we solve by Gauss-Jordan 
elimination, then the system of equations corresponding to the reduced row echelon 
form of the augmented matrix will be 


x\ — 0 

x 2 = 0 

x n — 0 

Thus the augmented matrix 


an 

a\ 2 

^1 n 

0 

a 2 \ 

a 22 

Cl2n 

0 

Ctrl 1 

a„ 2 ■ ■ 

Qnn 

0 


for (1) can be reduced to the augmented matrix 


( 2 ) 


1 0 0 0 0 

0 1 0 0 0 

0 0 1 0 0 

0 0 ••• 1 0 


0 
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Matrices 


for (2) by a sequence of elementary row operations. If we disregard the last column (all 
zeros) in each of these matrices, we can conclude that the reduced row echelon form of 
A is /„ . 

( c) =s> ( d) Assume that the reduced row echelon form of A is /„ , so that A can be reduced 
to /„ by a finite sequence of elementary row operations. By Theorem 1.5.1, each of these 
operations can be accomplished by multiplying on the left by an appropriate elementary 
matrix. Thus we can find elementary matrices £j, E 2 , . . . , E* such that 

E k ---E 1 E l A = I n (3) 

By Theorem 1.5.2, £j , E 2 , ...,£) t are invertible. Multiplying both sides of Equation (3) 
on the left successively by Ef 1 , , E ^ 1 , £f 1 we obtain 

A = E- l Ef l ■ ■ ■ Ef l I„ = E- X E~_ X • ■ ■ Ef l (4) 

By Theorem 1.5.2, this equation expresses A as a product of elementary matrices. 

(d) => (a) If A is a product of elementary matrices, then from Theorems 1 .4.7 and 1.5.2, 
the matrix A is a product of invertible matrices and hence is invertible. 


As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that 
can be used to tell whether a given matrix is invertible, and if so, produce its inverse. To 
derive this algorithm, assume for the moment, that A is an invertible n x n matrix. In 
Equation (3), the elementary matrices execute a sequence of row operations that reduce 
A to /„. If we multiply both sides of this equation on the right by A -1 and simplify, we 
obtain 

A -1 = Ek ■ ■ ■ E 2 E\I n 

But this equation tells us that the same sequence of row operations that reduces A to I„ 
will transform /„ to A” 1 . Thus, we have established the following result. 


Inversion Algorithm To find the inverse of an invertible matrix A, find a sequence of 
elementary row operations that reduces A to the identity and then perform that same 
sequence of operations on /„ to obtain A -1 . 


A simple method for carrying out this procedure is given in the following example. 


► EXAMPLE 4 Using Row Operations to Find A 1 

Find the inverse of 


A = 


1 2 
2 5 
1 0 


3 

3 

8 


Solution We want to reduce A to the identity matrix by row operations and simultane- 
ously apply these operations to I to produce A” 1 . To accomplish this we will adjoin the 
identity matrix to the right side of A, thereby producing a partitioned matrix of the form 


[A | /] 
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-1 


Then we will apply row operations to this matrix until the left side is reduced to /; these 
operations will convert the right side to A” 1 , so the final matrix will have the form 

[/ I A" 1 ] 


The computations are as follows: 


"1 

2 

3 

1 

0 

O ' 

2 

5 

3 

0 

1 

0 

1 

0 

8 

0 

0 

1 

"l 

2 

3 

1 

0 

0 " 

0 

1 

-3 

-2 

1 

0 

0 

-2 

5 

-1 

0 

1 

"l 

2 

3 

1 

0 

o " 

0 

1 

-3 

-2 

1 

0 

0 

0 

-1 

-5 

2 

1 

"1 

2 

3 

1 

0 

o ' 

0 

1 

-3 

-2 

1 

0 

0 

0 

1 

5 

-2 

-1 

"1 

2 

0 

-14 

6 

3 " 

0 

1 

0 

13 

-5 

-3 

0 

0 

1 

5 

-2 

-1 

"1 

0 

0 

-40 

16 

9 " 

0 

1 

0 

13 

-5 

-3 

0 

0 

1 

5 

-2 

-1 




"-40 

16 

9 ~ 


A ~ 

i _ 

13 

-5 

-3 




5 

-2 

-1 


We added —2 times the first 
row to the second and — 1 times 
the first row to the third. 


We added 2 times the 
second row to the third. 


We multiplied the 
third row by — 1 . 


We added 3 times the third 
row to the second and —3 times 
the third row to the first. 


We added —2 times the 
second row to the first. 


Often it will not be known in advance if a given n x n matrix A is invertible. However, 
if it is not, then by parts ( a ) and (c) of Theorem 1.5.3 it will be impossible to reduce A 
to /„ by elementary row operations. This will be signaled by a row of zeros appearing 
on the left side of the partition at some stage of the inversion algorithm. If this occurs, 
then you can stop the computations and conclude that A is not invertible. 


> EXAMPLES Showing That a Matrix Is Not Invertible 

Consider the matrix 

1 6 4 

2 4-1 

-12 5 


A = 
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Applying the procedure of Example 4 yields 


1 

6 

4 

1 

0 

o ' 

2 

4 

-1 

0 

1 

0 

-1 

2 

5 

0 

0 

1 

1 

6 

4 

1 

0 

o ' 

0 

-8 

-9 

-2 

1 

0 

0 

8 

9 

1 

0 

1 

1 

6 

4 

1 

0 

o ' 

0 

-8 

-9 

-2 

1 

0 

0 

0 

0 

-1 

1 

1 


We added —2 times the first 
row to the second and added 
the first row to the third. 


We added the second 

row to the third. 


Since we have obtained a row of zeros on the left side, A is not invertible. 


► EXAMPLE 6 Analyzing Homogeneous Systems 

Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial 
solutions. 

(a) X\ T 2 x 2 + 3 x 3 = 0 (b) X\ ~f~ 6 x 2 ~f- 4 x 3 = d 

2 xi + 5x2 + 3x3 = 0 2 xi + 4x2 — X 3 = 0 

x \ + 8x3 = 0 — xi + 2x2 + 5x3 = 0 

Solution From parts (a) and ( b ) of Theorem 1.5.3 a homogeneous linear system has 
only the trivial solution if and only if its coefficient matrix is invertible. From Examples 4 
and 5 the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, 
system (a) has only the trivial solution while system (b) has nontrivial solutions. "4 


Exercise Set 1.5 


In Exercises 1—2, determine whether the given matrix is ele- 
mentary. 


1. (a) 



0 

1 



1 

0 


(0 


1 

0 

0 


1 0 
0 1 
0 0 


(d) 


0 

1 

0 

0 


0 

0 

1 

0 


2 ' 

0 

0 

1 


2. (a) 


1 

0 


0 

V3 


(b) 


0 

1 

0 


1 

0 

0 



"l 

0 

o' 


'-1 

0 

o' 

(C) 

0 

1 

9 

(d) 

0 

0 

1 


0 

0 

1 


0 

1 

0 


In Exercises 3-4, find a row operation and the corresponding 
elementary matrix that will restore the given elementary matrix to 
the identity matrix. 










’-7 


0 

o ' 




"l 

- 3 ' 











3 . 

(a) 

0 

1 




( b ) 

0 


1 

0 








0 


0 

1 










'0 

0 


1 

ol 



1 

0 


0 ' 

















0 

1 


0 

0 


(0 

0 

1 


0 


( d ) 

1 

0 


0 










0 



-5 

0 


1 

















_0 

0 


0 


1 _ 









'1 

0 


o ' 





1 

o ’ 











4. 

(a) 

-3 

1 




( b ) 

0 

1 


0 











0 

0 


3 





'0 

0 

0 


f 


'1 


0 



1 

7 



0 

1 

0 

0 


0 


1 



0 


(c) 

0 

0 

1 



( d ) 

0 


0 



1 


0 






1 

0 

0 

0 


0 


0 



0 


In Exercises an elementary matrix E and a matrix A are 
given. Identify the row operation corresponding to E and ver- 
ify that the product EA results from applying the row operation 
to A. 


5. (a) E = 

0 

l" 


A 


’-1 


-2 

5 

-l" 



1 

0 




3 


-6 

-6 

-6 




"l 

0 

o' 




'2 

-1 

0 

-4 

-4" 

(b) E = 

0 

1 

0 

, 

A 


1 

-3 

-1 

5 

3 


0 

-3 

1 




2 

0 

1 

3 

-1 


'l 

0 

4 




'l 

4' 





(c) E = 

0 

1 

0 

, 


= 

2 

5 






0 

0 

1 




3 

6 











_ 





_ 





1 

0 

o' 


'2 

-1 

0 

-4 

-4' 

(b )E = 

-4 

1 

0 

, A = 

1 

-3 

-1 

5 

3 


0 

0 

1 


2 

0 

1 

3 

-1 







L 



'l 

0 

o' 


'1 

4' 

(c) E = 

0 

5 

0 

, a = 

2 

5 


0 

0 

1 


3 

6 


In Exercises 7-8, use the following matrices and find an ele- 
mentary matrix E that satisfies the stated equation. 



'3 

4 

l" 


~8 1 

5' 

A = 

2 

-7 

-1 

, B = 

2 -7 

-1 


8 

1 

5 


3 4 

1 


'3 

4 

l" 


8 1 

5 ’ 

C = 

2 

-7 

-1 

, D = 

-6 21 

3 


2 

-7 

3 


3 4 

1 


'8 

1 5 ' 





F = 

8 

1 1 






3 

4 1 





EA = 

B 



(b) EB = A 


EA = 

C 



(d) EC = A 


EB = 

D 



(b) ED = B 


EB = 

F 



(d) EE = B 



In Exercises -10, first use Theorem 1.4.5 and then use the 
inversion algorithm to find A -1 , if it exists. 
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"l — 5~| 6 4" 

10. (a) A = (b) A = 

3 — 16j [-3 -2 

In Exercises I -12, use the inversion algorithm to find the in- 
verse of the matrix (if the inverse exists). 



"1 2 3" 



"-1 3 -4" 

11. fa) 

2 5 3 


(b) 

2 4 1 


1 0 8_ 



-4 2 — 9_ 


- 1 1 

5 5 

2" 

5 


-1 1 2 - 

555 

12. (a) 

1 1 

5 5 

1 

10 

(b) 

2 3 3 

5 5 10 


1 4 

1 


1 4 1 


-5 5 

10- 


-5 5 10- 


In Exercises 5—18, use the inversion algorithm to find the in- 
verse of the matrix (if the inverse exists). 



'l 

0 

l" 




' V2 

3x/2 

o' 

13. 

0 

1 

1 



14. 

-4V2 

V2 

0 


1 

1 

0 




0 

0 

1 








"1 0 

0 o' 



'2 

6 

6' 




1 3 

0 0 


15. 

2 

7 

6 



16. 

1 3 

5 0 



2 

7 

7 














_1 3 

5 7_ 



'2 

-4 


0 

O' 


O 

O 

2 

O' 


1 

2 


12 

0 


1 0 

0 

1 

17. 

0 

0 


2 

0 

18. 

0 -1 

3 

0 


_0 

-1 


-4 

— 5_ 


2 1 

5 

— 3_ 


In Exercises 19-20, find the inverse of each of the following 
4x4 matrices, where k \ , k 2 ,k 3 ,k 4 , and k are all nonzero. 




0 

0 

0 " 


~k 

1 

0 

O' 

19. (a) 

0 

k 2 

0 

0 

(b) 

0 

1 

0 

0 

0 

0 

k-i 

0 

0 

0 

k 

1 


_0 

0 

0 

U_ 


_0 

0 

0 

1 _ 


"0 

0 

0 

k{ 


k 

0 

0 

o' 

20. (a) 

0 

0 

k 2 

0 

(b) 

1 

k 

0 

0 

0 

& 3 

0 

0 

0 

1 

k 

0 


k\ 

0 

0 

0 _ 


_0 

0 

1 

k 


In Exercises 21-22, find all values of c, if any, for which the 
given matrix is invertible. 


c 

C 

c 


c 

1 

0 

1 

c 

c 

22. 

1 

c 

1 

1 

1 

c 


0 

1 

c 


9. (a) A = 
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In Exercises 23-26, express the matrix and its inverse as prod- 
ucts of elementary matrices. 



'-3 r 


' 1 0" 

23. 

2 2 

24. 

-5 2 


33. Prove that if B is obtained from A by performing a sequence 
of elementary row operations, then there is a second sequence 
of elementary row operations, which when applied to B recov- 
ers A. 



"l 

0 

-2 


"l 

1 

o' 

25. 

0 

4 

3 

26. 

1 

1 

1 


0 

0 

1 


0 

1 

1 


In Exercises 27-28, show that the matrices A and B are row 
equivalent by finding a sequence of elementary row operations 
that produces B from A, and then use that result to find a matrix 
C such that CA — B. 



"l 2 

3" 


"l 

0 

5' 



II 

1 4 

1 

, B 

= 

0 

2 

-2 




2 1 

9 


1 

1 

4 




2 

1 

o' 


6 


) 

4' 

28. A = 

-1 

1 

0 

, B = 

-5 



0 


3 

0 

-1 


-1 



-1 


29. Show that if 


A = 


is an elementary matrix, then at least one entry in the third 
row must be zero. 


30. Show that 


A = 


0 a 
b 0 
0 d 
0 0 
0 0 


0 0 
c 0 
0 e 

f 0 
0 h 


0 

0 

0 

8 

0 


is not invertible for any values of the entries. 


True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 

false, and justify your answer. 

(a) The product of two elementary matrices of the same size must 
be an elementary matrix. 

(b) Every elementary matrix is invertible. 

(c) If A and B are row equivalent, and if B and C are row equiv- 
alent, then A and C are row equivalent. 

(d) If A is an n x n matrix that is not invertible, then the linear 
system Ax = 0 has infinitely many solutions. 

(e) If A is an n x n matrix that is not invertible, then the matrix 
obtained by interchanging two rows of A cannot be invertible. 

(f ) If A is invertible and a multiple of the first row of A is added 
to the second row, then the resulting matrix is invertible. 

(g) An expression of an invertible matrix A as a product of ele- 
mentary matrices is unique. 

Working with Technology 

Tl. It can be proved that if the partitioned matrix 

"A B 
C D 

is invertible, then its inverse is 

"A" 1 + A- l B(D - CA-'B)- l CA~ l -A~ l B(D - CA~ l B)~ r 
-{D -CA- l B)- l CA~ l ( D-CA 'Br 1 


Working with Proofs 

31. Prove that if A and B are m x n matrices, then A and B are 
row equivalent if and only if A and B have the same reduced 
row echelon form. 

32. Prove that if A is an invertible matrix and B is row equivalent 
to A, then B is also invertible. 


provided that all of the inverses on the right side exist. Use this 


result to find the inverse of the matrix 



'1 

2 

1 

0 


0 

-1 

0 

1 


0 

0 

2 

0 


.0 

0 

3 

3 
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1.6 


Number of Solutions of a 
Linear System 


Solving Linear Systems by 
Matrix Inversion 


More on Linear Systems and Invertible Matrices 

In this section we will show how the inverse of a matrix can be used to solve a linear system 
and we will develop some more results about invertible matrices. 

In Section 1 . 1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear 
system either has no solutions, has exactly one solution, or has infinitely many solutions. 
We are now in a position to prove this fundamental result. 


IEM 1.6.1 A system of linear equations has zero, one, or infinitely many solutions. 
There are no other possibilities. 


Proof If Ax = b is a system of linear equations, exactly one of the following is true: 
(a) the system has no solutions, (b) the system has exactly one solution, or (c) the system 
has more than one solution. The proof will be complete if we can show that the system 
has infinitely many solutions in case (c). 

Assume that Ax = b has more than one solution, and let xo = xj — X 2 , where xi 
and x 2 are any two distinct solutions. Because Xj and x 2 are distinct, the matrix x 0 is 
nonzero; moreover, 

Axo = A(xi — x 2 ) = Axi — Ax 2 = b — b = 0 
If we now let k be any scalar, then 

A(xj + kx o) = Axi + A(kxo) — Axi + k(Axo) 

= b + £0 = b + 0 = b 

But this says that xi + kxo is a solution of Ax = b. Since xo is nonzero and there are 
infinitely many choices for k, the system Ax = b has infinitely many solutions. 

Thus far we have studied two procedures for solving linear systems — Gauss-Jordan 
elimination and Gaussian elimination. The following theorem provides an actual formula 
for the solution of a linear system of n equations in n unknowns in the case where the 
coefficient matrix is invertible. 


THEOREM 1.6.2 If A is an invertible n x n matrix, then for each n x 1 matrix b, the 
system of equations Ax = b has exactly one solution, namely, x = A~*b. 


Proof Since A(A~'b) = b, it follows that x = A~'b is a solution of Ax = b. To show 
that this is the only solution, we will assume that xo is an arbitrary solution and then 
show that x 0 must be the solution A~'b. 

If x 0 is any solution of Ax = b, then Ax 0 = b. Multiplying both sides of this equa- 
tion by A -1 , we obtain x 0 = A~'b. 

► EXAMPLE 1 Solution of a Linear System Using A -1 

Consider the system of linear equations 

X\ T 2x 2 T 3 x 3 = 5 
2xi + 5 x 2 + 3x3 = 3 
xi + 8 x 3 = 17 



62 Systems of Linear Equations and Matrices 


Keep in mind that the method 
of Example 1 only applies 
when the system has as many 
equations as unknowns and 
the coefficient matrix is invert- 
ible. 


In matrix form this system can be written as Ax = b, where 


1 2 3" 


Xl 


" 5" 

2 5 3 

, X = 

X 2 

, b = 

3 

1 

O 

OO 

1 


x 3 


17 


In Example 4 of the preceding section, we showed that A is invertible and 


-40 


A’ 1 = 


13 

5 


16 

-5 

-2 


9 

-3 

-1 


By Theorem 1.6.2, the solution of the system is 


"-40 16 9" 


" 5" 


l" 

13 -5 -3 


3 

= 

-1 

5 -2 -1 


17 


2 


orxi = 1 , x 2 = — 1 , JC 3 = 2 . 


Linear Systems with a 
Common Coefficient Matrix 


Frequently, one is concerned with solving a sequence of systems 

Ax = bi, Ax = b 2 , Ax = b 3 ,..., Ax = b* 


each of which has the same square coefficient matrix A. If A is invertible, then the 
solutions 

xi = A _1 bi, x 2 = A _1 b 2 , x 3 = A _1 b 3 , . . . , x* = A _1 b* 


can be obtained with one matrix inversion and k matrix multiplications. An efficient 
way to do this is to form the partitioned matrix 

[A | b x | b 2 | | b*] (1) 

in which the coefficient matrix A is “augmented’’ by all k of the matrices b[ , b 2 , . . . , bk, 
and then reduce ( 1 ) to reduced row echelon form by Gauss-Jordan elimination. In this 
way we can solve all k systems at once. This method has the added advantage that it 
applies even when A is not invertible. 


► EXAMPLE 2 Solving Two Linear Systems at Once 

Solve the systems 

(a) x\ + 2x2 + 3x 3 = 4 (b) x\ + 2x2 + 3x 3 = 1 

2xi + 5x 2 + 3x 3 = 5 2xi + 5x 2 + 3x 3 = 6 

x\ + 8x 3 = 9 xi + 8 x 3 = —6 

Solution The two systems have the same coefficient matrix. If we augment this co- 
efficient matrix with the columns of constants on the right sides of these systems, we 
obtain 


"l 

2 

3 

4 

l" 

2 

5 

3 

5 

6 

1 

0 

8 

9 

-6 


Reducing this matrix to reduced row echelon form yields (verify) 


'l 

0 

0 

1 

2 " 

0 

1 

0 

0 

1 

0 

0 

1 

1 

-1 
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Properties of Invertible 
Matrices 


Equivalence Theorem 


It follows from the last two columns that the solution of system (a) is X\ = 1, x 2 — 0, 
*3 = 1 and the solution of system (b) is xi = 2, x 2 = 1, x 2 = — 1. 


Up to now, to show that an n x n matrix A is invertible, it has been necessary to find an 
n x n matrix B such that 

AB = I and BA = I 

The next theorem shows that if we produce an n x n matrix B satisfying either condition, 
then the other condition will hold automatically. 


THEOREM 1.6.3 Let A be a square matrix. 

(i a ) If B is a square matrix satisfying BA — I, then B = A -1 

(b) If B is a square matrix satisfying AB — I, then B = A -1 


We will prove part (a) and leave part (b) as an exercise. 

Proof (a) Assume that BA = I . If we can show that A is invertible, the proof can be 
completed by multiplying BA = I on both sides by A -1 to obtain 

BAA~ l = IA~ l or BI = IA~ l or B = A” 1 

To show that A is invertible, it suffices to show that the system Ax = 0 has only the trivial 
solution (see Theorem 1.5.3). Let xo be any solution of this system. If we multiply both 
sides of Ax 0 = 0 on the left by B, we obtain BAx 0 — B 0 or 7x 0 = 0 or x 0 = 0 . Thus, 
the system of equations Ax = 0 has only the trivial solution. 


We are now in a position to add two more statements to the four given in Theorem 1.5.3. 


Equivalent Statements 

If A is an n x n matrix, then the following are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

(d) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b. 

(/) Ax = b has exactly one solution for every n x 1 matrix b. 


Proof Since we proved in Theorem 1.5.3 that (a), ( b ), (c), and (d) are equivalent, it will 
be sufficient to prove that (a) (/) =>• (e) => (a). 

(a) => ( f ) This was already proved in Theorem 1.6.2. 

( f ) => (e) This is almost self-evident, for if Ax = b has exactly one solution for every 
n x 1 matrix b, then Ax = b is consistent for every n x 1 matrix b. 
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(e) =>■ (a) If the system Ax = b is consistent for every n x I matrix b, then, in particular, 
this is so for the systems 



T 


"o" 


"o" 


0 


1 


0 

Ax = 

0 

, Ax = 

0 

, . . . , Ax = 

0 


0 


0 


1 


Let xi, x 2 , . . . , x„ be solutions of the respective systems, and let us form annxn ma- 
trix C having these solutions as columns. Thus C has the form 


C = [X! I X 2 I • • ■ I X„] 

As discussed in Section 1.3, the successive columns of the product AC will be 

Axi, Ax 2 , . . . , Ax„ 

It follows from the equiva- 
lency of parts (e) and (/) that 
if you can show that Ax = b 
has at least one solution for ev- 
ery n x 1 matrix b, then you 
can conclude that it has ex- 
actly one solution for every 
ii x 1 matrix b. 

We know from earlier work that invertible matrix factors produce an invertible prod- 
uct. Conversely, the following theorem shows that if the product of square matrices is 
invertible, then the factors themselves must be invertible. 


[see Formula (8) of Section 1.3]. Thus, 


AC = [Ax! | Ax 2 | ••• | Ax„] = 


1 0 
0 1 
0 0 


■■ 0 

■■ 0 

■■ 0 


= 1 


0 0 ■■■ 1 

By part ( b ) of Theorem 1.6.3, it follows that C = A -1 . Thus, A is invertible. 


Let A and B be square matrices of the same size. If AB is invertible, 
then A and B must also be invertible. 


Proof We will show first that B is invertible by showing that the homogeneous system 
Bx = 0 has only the trivial solution. If we assume that xo is any solution of this system, 
then 

(AB)x 0 = A(Bx 0 ) = AO = 0 

so x 0 = 0 by parts (a) and ( b ) of Theorem 1.6.4 applied to the invertible matrix AB. 
But the invertibility of B implies the invertibility of B" x (Theorem 1.4.7), which in turn 
implies that 

(AB)B~ X = A(BB~ 1 ) = AI = A 

is invertible since the left side is a product of invertible matrices. This completes the 
proof. 

In our later work the following fundamental problem will occur frequently in various 
contexts. 


A Fundamental Problem Let A be a fixed m x n matrix. Find all m x 1 matrices b 
such that the system of equations Ax = b is consistent. 
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If A is an invertible matrix. Theorem 1.6.2 completely solves this problem by assert- 
ing that for every m x 1 matrix b, the linear system Ax = b has the unique solution 
x = A~'b. If A is not square, or if A is square but not invertible, then Theorem 1.6.2 
does not apply. In these cases b must usually satisfy certain conditions in 
order for Ax = b to be consistent. The following example illustrates how the methods 
of Section 1.2 can be used to determine such conditions. 


I EXAMPLE 3 Determining Consistency by Elimination 

What conditions must b \ , b 2 , and b 2 satisfy in order for the system of equations 

X\ + + 2X3 = ^1 
X| + X 3 = b 2 

2xi + xt + 3x 3 = £> 3 

to be consistent? 


Solution The augmented matrix is 

~1 1 2 bi 

1 0 1 b 2 

2 1 3 b 3 


which can be reduced to row echelon form as follows: 


112 bi 

0 —1 —1 b\ — b 2 

0 -1 -1 b 2 — 2b\ 

112 bi 

0 1 1 b\ — b 2 

0 -1 -1 b 2 — 2/?i 


112 bi 

0 1 1 bi - b 2 

0 0 0 b 2 — b 2 — bi 


— 1 times the first row was added 
to the second and —2 times the 
first row was added to the third. 


The second row was 
multiplied by — 1 . 


The second row was added 
to the third. 


It is now evident from the third row in the matrix that the system has a solution if and 
only if bi, b 2 , and b 3 satisfy the condition 


b 2 — b 2 — bi = 0 or b 2 = b\ + b 2 


To express this condition another way. Ax = b is consistent if and only if b is a matrix 
of the form 

b\ 

b = b 2 
bi + b 2 

where b x and b 2 are arbitrary. 


[ EXAMPLE 4 Determining Consistency by Elimination 

What conditions must b \ , b 2 , and b 2 satisfy in order for the system of equations 

xi + 2 xt + 3x 3 = b i 
2xi + 5x2 + 3x3 = b 2 
Xi + 8x3 = b 2 


to be consistent? 
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Solution The augmented matrix is 

"l 2 3 bi 
2 5 3 b 2 
1 0 8 b 3 


Reducing this to reduced row echelon form yields (verify) 


1 0 0 — 40/q + 1 6Z?2 + 9/?3 

0 1 0 13&i - 5b 2 - 3/73 

0 0 1 5b\ — 2b 2 — b 2 


( 2 ) 


What does the result in Exam- 
ple 4 tell you about the coeffi- 
cient matrix of the system? 


In this case there are no restrictions on b\, b 2 , and b 2 , so the system has the unique 
solution 

x\ — —40/7J + 16Z>2 + 9/>3, x 2 = 13/7! — 5b 2 — 3b 2 , x 2 = 5Z>i — 2b 2 — Z> 3 (3) 

for all values of b\, b 2 , and b 2 . *4 


Exercise Set 1.6 

In Exercises 1-8, solve the system by inverting the coefficient 
matrix and using Theorem 1.6.2. 

1. 


Xl + x 2 = 2 

5.Vi + 6x 2 = 9 


3. xi + 3x2 + X 3 = 4 

2x\ T 2x 2 + X3 = — 1 
2xi T 3x2 + X3 = 3 

5. x + y + z = 5 
x + y — 4z — 10 
— 4x + y + z = 0 


7. 3 xi + 5 x 2 = bi 
Xi + 2x2 = b 2 


2. 4xi — 3x2 = — 3 
2xi — 5 x 2 = 9 

4. 5xi + 3x2 + 2 x 3 = 4 
3xi T 3x2 T 2 x 3 — 2 
X 2 -f- X 3 — 5 

6. — x — 2y — 3z = 0 

w + x + 4y + 4z = 7 
w + 3.x + 7y + 9z = 4 
— w — 2.x — 4y — 6z — 6 

8. Xi T 2x2 T 3x3 = bi 
2xi T 5x2 T 5x3 = b 2 
3xj -f- 5x2 T 8x3 = 63 


12. xi + 3x2 + 5x 3 = b\ 

— Xi — 2x2 — b 2 

2xi + 5x2 + 4x3 = b 2 

(i) bi = 1, b 2 = 0, b 2 = -1 

(ii) bi = 0. b 2 = 1, Z> 3 = 1 

(iii) b\ = — 1, = — 1, Z>3 = 0 

In Exercises 13-17, determine conditions on the Z>;’s, if any, in 
order to guarantee that the linear system is consistent. 

13. Xi + 3x2 = b] 14. 6 x 1 — 4x 2 = b\ 

— 2xi + X2 = b 2 3xi — 2x2 = b 2 


15. Xi — 2 .X 2 + 5x3 — bi 
4 xi — 5 x' 2 + 8x3 = b 2 
— 3 xi + 3 x 2 — 3 x 3 = i>3 


16. Xi — 2x 2 — X3 = bi 
— 4 xi + 5x2 + 2x3 = b 2 
— 4 xi 4 - 7 x 2 + 4.X3 = 63 


In Exercises -12, solve the linear systems together by reducing 
the appropriate augmented matrix. 

9. 


17. Xi — X 2 + 3x3 + 2 x 4 — b\ 

— 2xi 4“ X2 T 5x3 -f- X4 — b 2 

— 3 xi 4 “ 2x2 + 2x3 — X4 = h'\ 

4 xi — 3x2 + X3 + 3x4 = b$ 


xi — 5x 2 = b\ 


18. Consider the matrices 





3xi + 2x 2 = b 2 



2 

1 

2' 


Xl 

(i) bi = 1, b 2 = 4 

(ii) bi = -2, b 2 = 5 

A = 

2 

2 

-2 

and x = 

x 2 

— Xi + 4x2 + X3 = b\ 



3 

1 

1 


_x 3 _ 


xi + 9x2 — 2x3 = b 2 
6x1 + 4x2 — 8x3 = b 2 

(i) bi =0, b 2 = 1, 

(ii) b\ = -3, b 2 = 4, 

11. 4xi — 7x2 = b\ 

Xi + 2x2 = b 2 
(i) bi = 0, b 2 = l 

(iii) />! = —!, b 2 — 3 


*3 = 0 
b 2 = -5 


(a) Show that the equation Ax = x can be rewritten as 

(A — I)x = 0 and use this result to solve Ax = x for x. 

(b) Solve Ax = 4x. 

In Exercises 19-20, solve the matrix equation for X. 


(ii) Z7j = -4, b 2 = 6 

(iv) b\ = —5, Z>2 = 1 


"l 

-1 

f 


'2 

-1 

5 

7 

8~ 

2 

3 

0 

X = 

4 

0 

-3 

0 

1 

0 

2 

-1 


3 

5 

-7 

2 

1 


19. 
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'-2 

0 

f 


'4 

3 

2 

f 

0 

-1 

-1 

X = 

6 

7 

8 

9 

1 

1 

-4 


1 

3 

7 

9 


Working with Proofs 

21. Let Ax = 0 be a homogeneous system of n linear equations in 
n unknowns that has only the trivial solution. Prove that if k 
is any positive integer, then the system A k x = 0 also has only 
the trivial solution. 

22. Let Ax = 0 be a homogeneous system of n linear equations 
in n unknowns, and let Q be an invertible n x n matrix. 
Prove that Ax = 0 has only the trivial solution if and only 
if ( QA)x = 0 has only the trivial solution. 

23. Let Ax = b be any consistent system of linear equations, and 
let xi be a fixed solution. Prove that every solution to the 
system can be written in the form x = x t + x 0 , where x 0 is a 
solution to Ax = 0. Prove also that every matrix of this form 
is a solution. 

24. Use part (a) of Theorem 1.6.3 to prove part (b). 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 

false, and justify your answer. 

(a) It is impossible for a system of linear equations to have exactly 
two solutions. 

(b) If A is a square matrix, and if the linear system Ax = b has a 
unique solution, then the linear system Ax = c also must have 
a unique solution. 

(c) If A and B are n x n matrices such that AB = /„, then 
BA = 

(d) If A and B are row equivalent matrices, then the linear systems 
Ax = 0 and fix = 0 have the same solution set. 


(e) Let A be an n x n matrix and S is an n x n invertible matrix. 
If x is a solution to the linear system (S~'AS)x = b, then Sx 
is a solution to the linear system Ay = Sb. 

(f ) Let A be an n x n matrix. The linear system Ax = 4x has a 
unique solution if and only if A — 4/ is an invertible matrix. 

(g) Let A and 6 be n x n matrices. If A or 6 (or both) are not 
invertible, then neither is AB . 

Working with Technology 

Tl. Colors in print media, on computer monitors, and on televi- 
sion screens are implemented using what are called “color mod- 
els”. For example, in the RGB model, colors are created by mixing 
percentages of red (R), green (G), and blue (B), and in the YIQ 
model (used in TV broadcasting), colors are created by mixing 
percentages of luminescence (Y) with percentages of a chromi- 
nance factor (I) and a chrominance factor (Q). The conversion 
from the RGB model to the YIQ model is accomplished by the 
matrix equation 


Y" 


".299 .587 .114' 


"R~ 

I 

= 

.596 -.275 -.321 


G 

Q_ 


212 -.523 .31 1_ 


B 


What matrix would you use to convert the YIQ model to the RGB 
model? 

T2. Let 



"1 

-2 

2" 


"0" 


"11" 


r 

A = 

4 

5 

1 

, Bi = 

1 

, b 2 = 

5 

, b 3 = 

-4 


0 

3 

-1 


7 


3 


2 


Solve the linear systems Ax = 6 1 , Ax = Bi, Ax = fi 3 using the 
method of Example 2. 


1.7 Diagonal, Triangular, and Symmetric Matrices 

In this section we will discuss matrices that have various special forms. These matrices arise 
in a wide variety of applications and will play an important role in our subsequent work. 


Diagonal Matrices A square matrix in which all the entries off the main diagonal are zero is called a diagonal 
matrix. Here are some examples: 


'2 

0 



"1 

0 

0" 


0 

1 

0 

, 

0 

0 

1 



0 

-4 

0 

0 


0 0 
0 0 
0 0 
0 8 


'0 O' 
0 0 
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Confirm Formula (2) by show- 
ing that 

Dfl" 1 = D l D = / 


A general n x n diagonal matrix D can be written as 


D = 


d\ 

0 


0 0 

d 2 ■■■ 0 


0 0 d n 


( 1 ) 


A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in 
this case the inverse of (1) is 


D ^ 1 


1 /dx 0 
0 \/d 2 


0 

0 


0 0 \/d„ 


( 2 ) 


You can verify that this is so by multiplying (1) and (2). 

Powers of diagonal matrices are easy to compute; we leave it for you to verify that if 
D is the diagonal matrix ( 1 ) and k is a positive integer, then 


D k 


d\ 0 
0 d\ 


0 

0 


0 0 -4 


(3) 


► EXAMPLE 1 Inverses and Powers of Diagonal Matrices 


If 


then 



0 

-3 

0 


0~ 

0 

2 



'l 

0 

0 


"1 

0 

0" 


"l 

0 

0 

A" 1 = 

0 

1 

3 

0 

, A 5 = 

0 

-243 

0 

, A" 5 = 

0 

1 

243 

0 


0 

0 

1 

2 _ 


0 

0 

32 


0 

0 

1 

32 _ 


◄ 


Matrix products that involve diagonal factors are especially easy to compute. For 
example, 


d\ 

0 

0 " 


an 

«12 

fl!3 

a 14 


d\Q.\\ 

d\a\ 2 

d\d\i 

d\a 14 

0 

d 2 

0 


a 2 \ 

a 22 

1 223 

«24 

= 

6 ^ 2^21 

d 2 a 22 

^2^23 

^2^24 

0 

0 

d 2 


«31 

a 32 

fl33 

«34 



d 2 a 22 

^3^33 

^ 3^34 


flu fl 12 fli 3 

«21 Oil fl23 

fl31 fl32 fl33 

^41 ^42 O43 


d\ 0 0 
0 d 2 0 
0 0 


d 2 


d\a\\ d 2 a \ 2 d 2 a \ 2 

d\a 2 \ d 2 a 22 d 2 a 22 

d\a 2 \ d 2 a 22 d 2 a 22 

d \ci 2 \ d 2 a ^ 2 d 2 a 43 


In words, to multiply a matrix A on the left by a diagonal matrix D, multiply successive 
rows of A by the successive diagonal entries of D , and to multiply A on the right by D, 
multiply successive columns of A by the successive diagonal entries of D. 
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Triangular Matrices 


Properties ofTrianguiar 
Matrices 



▲ Figure 1.7.1 


A square matrix in which all the entries above the main diagonal are zero is called lower 
triangular , and a square matrix in which all the entries below the main diagonal are zero 
is called upper triangular. A matrix that is either upper triangular or lower triangular is 
called triangular. 

> EXAMPLE 2 Upper and Lower Triangular Matrices 


a li 

an 

an 

a 14 


a\\ 

0 

0 

0 " 

0 

a22 

023 

a 24 


021 

022 

0 

0 

0 

0 

033 

034 


031 

032 

033 

0 

0 

0 

0 

044 _ 


_04l 

042 

043 

044 _ 


t I 

A general 4x4 upper A general 4x4 lower 

triangular matrix triangular matrix 


Remark Observe that diagonal matrices are both upper triangular and lower triangular since 
they have zeros below and above the main diagonal. Observe also that a square matrix in row 
echelon form is upper triangular since it has zeros below the main diagonal. 

Example 2 illustrates the following four facts about triangular matrices that we will state 
without formal proof: 

A square matrix A = [a 17 ] is upper triangular if and only if all entries to the left of 
the main diagonal are zero; that is, aij = 0 if i > j (Figure 1.7.1). 

A square matrix A = [ay ] is lower triangular if and only if all entries to the right of 
the main diagonal are zero; that is, ay = 0 if/ < j (Figure 1.7.1). 

A square matrix A = [ay] is upper triangular if and only if the ; th row starts with at 
least i — 1 zeros for every i. 

A square matrix A — [ ay ] is lower triangular if and only if the /th column starts with 
at least j — 1 zeros for every j. 

The following theorem lists some of the basic properties of triangular matrices. 


THEOREM 1.7.1 

(a) The transpose of a lower triangular matrix is upper triangular, and the transpose 
of an upper triangular matrix is lower triangular. 

(b) The product of lower triangular matrices is lower triangular, and the product of 
upper triangular matrices is upper triangular. 

(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. 

(i d ) The inverse of an invertible lower triangular matrix is lower triangular, and the 
inverse of an invertible upper triangular matrix is upper triangular. 


Part (a) is evident from the fact that transposing a square matrix can be accomplished by 
reflecting the entries about the main diagonal; we omit the formal proof. We will prove 
( b ), but we will defer the proofs of (c) and (d) to the next chapter, where we will have the 
tools to prove those results more efficiently. 

Proof (b) We will prove the result for lower triangular matrices; the proof for upper trian- 
gular matrices is similar. Let A = [fly] and B = [i>y] be lower triangular n x n matrices, 
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and let C = [c i; ] be the product C = AB. We can prove that C is lower triangular by 
showing that Cy = 0 for i < j . But from the definition of matrix multiplication, 

Cij = a n b\j + a i2 b 2 j H b a in b nj 

If we assume that i < j, then the terms in this expression can be grouped as follows: 


Cij — a n bij + a.i 2 b 2 j + • • • + +a.ijbjj + • • • + ai„b„j 

v*“ ' v J 

Terms in which the row Terms in which the row 

number of b is less than number of a is less than 

the column number of b the column number of a 


In the first grouping all of the b factors are zero since B is lower triangular, and in the 
second grouping all of the a factors are zero since A is lower triangular. Thus, Cy = 0, 
which is what we wanted to prove. 


Observe that in Example 3 the 
diagonal entries of AB and 
BA are the same, and in both 
cases they are the products 
of the corresponding diagonal 
entries of A and B. In the 
exercises we will ask you to 
prove that this happens when- 
ever two upper triangular ma- 
trices or two lower triangular 
matrices are multiplied. 


► EXAMPLE 3 Computations with Triangular Matrices 

Consider the upper triangular matrices 


"l 

3 

-l" 


"3 

-2 

2 

0 

2 

4 

, B = 

0 

0 

-1 

0 

0 

5 


0 

0 

1 


It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix 
B is not. Moreover, the theorem also tells us that A” 1 , AB, and BA must be upper 
triangular. We leave it for you to confirm these three statements by showing that 


"l -1 1 

2 5 


"3 -2 -2" 


"3 5 -l" 

0 l—l 

2 5 

, AB = 

0 0 2 

, BA = 

0 0-5 

1 

O 

O 

1 


0 0 5 


0 0 5 


Symmetric Matrices 


DEFINITION 1 A square matrix A is said to be symmetric if A — A T . 


It is easy to recognize a sym- 
metric matrix by inspection: 
The entries on the main diag- 
onal have no restrictions, but 
mirror images of entries across 
the main diagonal must be 
equal. Here is a picture using 
the second matrix in Exam- 
ple 4: 


► EXAMPLE 4 Symmetric Matrices 

The following matrices are symmetric, since each is equal to its own transpose (verify). 



di 0 0 0 

0 d 2 0 0 
0 0 d 2 0 
0 0 0 d$ 



Remark It follows from Formula (14) of Section 1.3 that a square matrix A is symmetric if and 
only if 

(A)ij = (A)ji (4) 


for all values of i and j. 


The following theorem lists the main algebraic properties of symmetric matrices. The 
proofs are direct consequences of Theorem 1.4.8 and are omitted. 
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Invertibility of Symmetric 
Matrices 


Products AA t and A 1 A 
are Symmetric 


!EM 1.7.2 If A cmd B are symmetric matrices with the same size, and if k is any 
scalar, them 

( a ) A t is symmetric. 

(b) A + B and A — B are symmetric. 

(c) kA is symmetric. 


It is not true, in general, that the product of symmetric matrices is symmetric. To 
see why this is so, let A and B be symmetric matrices with the same size. Then it follows 
from part (e) of Theorem 1.4.8 and the symmetry of A and B that 

(AB) t = B t A t = BA 

Thus, ( AB) t = AB if and only if AB = BA, that is, if and only if A and B commute. In 
summary, we have the following result. 


F5EM 1.7.3 The product of two symmetric matrices is symmetric if and only if the 
matrices commute. 


► EXAMPLES Products of Symmetric Matrices 

The first of the following equations shows a product of symmetric matrices that is not 
symmetric, and the second shows a product of symmetric matrices that is symmetric. We 
conclude that the factors in the first equation do not commute, but those in the second 
equation do. We leave it for you to verify that this is so. 



'1 2 

r-4 

r 


'-2 1 


2 3_ 

L i 

o_ 


-5 2 

'1 

21 

'-4 

3' 


'2 f 

2 

3 

3 

-1 


1 3 


In general, a symmetric matrix need not be invertible. For example, a diagonal matrix 
with a zero on the main diagonal is symmetric but not invertible. However, the following 
theorem shows that if a symmetric matrix happens to be invertible, then its inverse must 
also be symmetric. 


[EM 1.7.4 If A is an invertible symmetric matrix, then A 1 is symmetric. 


Proof Assume that A is symmetric and invertible. From Theorem 1.4.9 and the fact 
that A = A T , we have 

{A~ l ) T = (A 7 ') -1 = A -1 
which proves that A" 1 is symmetric. 

Matrix products of the form AA T and A T A arise in a variety of applications. If A is 
an m x n matrix, then A T is an n x m matrix, so the products AA T and A T A are both 
square matrices — the matrix AA T has size m x m, and the matrix A T A has size n x n. 
Such products are always symmetric since 

(AA 7 ') T = (A t ) t A t = AA r and ( A r A) T = A T (A T ) T = A T A 
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► EXAMPLE 6 The Product of a Matrix and Its Transpose Is Symmetric 


Let A be the 2x3 matrix 



Then 



10 

-2 

-11 

-2 

4 

-8 

-11 

-8 

41 

' 21 

-17" 


-17 

34 



Observe that A T A and AA T are symmetric as expected. M 


Later in this text, we will obtain general conditions on A under which AA T and A T A 
are invertible. However, in the special case where A is square, we have the following 
result. 


If A is an invertible matrix, then AA T and A T A are also invertible. 


Proof Since A is invertible, so is A 7 by Theorem 1.4.9. Thus AA T and A 7 A are invertible, 
since they are the products of invertible matrices. 


Exercise Set 1.7 


In Exercises 1—2, classify the matrix as upper triangular, lower 
triangular, or diagonal, and decide by inspection whether the ma- 
trix is invertible. [Note: Recall that a diagonal matrix is both up- 
per and lower triangular, so there may be more than one answer 
in some parts.] 


1. (a) 


2 

0 


1 

3 



0 

0 


(c) 


-1 

0 

0 


0 

2 

0 


0 

0 

1 

5- 


(d) 


-2 

0 

0 


7 
3 

8 







'-4 

0 

O' 





1 

2 

-5 







4. 





0 

3 

0 





-3 

-1 

0 












0 

0 

2_ 





'5 

0 

O' 


'-3 

2 


0 

4 

-4 

5. 

0 

2 

0 


1 

-5 


3 

0 

3 


0 

0 

-3 


-6 

2 


2 

2 

2 



"2 0 0" 


~ 4 — 1 3" 


'-3 0 0" 

6. 

0-1 0 


1 2 0 


0 5 0 


1 

O 

O 

4^ 

1 


-5 1 -2 


0 0 2 


2. (a) 


4 

1 


0 

7 



-3 

0 


In Exercises 10, find A 2 , A 1 , and A k (where k is any inte- 
ger) by inspection. 



"4 

0 

o' 


"3 

0 

O' 

(O 

0 

3 

5 

0 

(d) 

3 

1 

0 


0 

0 

-2 


7 

0 

0 


_ _ 


"-6 

0 

o~ 

1 

0 





0 

-2 

8. A = 

0 

3 

0 

L J 


0 

0 

5 


In Exercises 3-6, find the product by inspection. 



'3 0 O' 


■ 2 r 

3. 

0-1 0 


-4 1 


0 0 2 


2 5 






"-2 

0 

0 

o 

_l 

” 1 

0 

o" 






2 




0 

-4 

0 

0 

0 

1 

0 

ii 

o 






3 



0 

0 

-3 

0 

0 

0 

1 








4 _ 


0 

0 

0 

2 
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In Exercises I 12, compute the product by inspection. 



'1 

0 

o' 


"2 

0 

o' 


"0 

0 

o' 

11 . 

0 

0 

0 


0 

5 

0 


0 

2 

0 


0 

0 

3 


0 

0 

0 _ 


0 

0 

1 


In Exercises 23-24, find the diagonal entries of AB by inspec- 
tion. 



'3 

2 

6' 


'-1 

2 

1 

II 

cn 

0 

1 

-2 

, B = 

0 

5 

3 


0 

0 

-1 


0 

0 

6 



"-1 

0 

0" 


"3 

0 

0" 


"5 

0 

o' 

12. 

0 

2 

0 


0 

5 

0 


0 

-2 

0 


0 

0 

4_ 


0 

0 

7 


0 

0 

3 



4 

0 

o' 


'6 

0 

o' 

24. A = 

-2 

0 

0 

, B = 

1 

5 

0 


-3 

0 

7 


3 

2 

6 


In Exercises 14, compute the indicated quantity. 



"l o’ 

39 

"l o" 

13. 

0 -1 

14. 

0 -1 


In Exercises 25-26, find all values of the unknown constant! s) 
for which A is symmetric. 


25. A = 


4 

a -f 5 


-3 

-1 


In Exercises -16, use what you have learned in this section 
about multiplying by diagonal matrices to compute the product 
by inspection. 


26. A = 


a —2b + 2c 
5 

-2 


2 a + b + c 
a + c 
1 



a 

0 

O' 


U 

V 


r 

s 

t 


a 

0 

0 

15. (a) 

0 

b 

0 


w 

X 

(b) 

u 

V 

w 


0 

b 

0 


_0 

0 

c 


_y 

z_ 


X 

y 

z_ 


_0 

0 

c 


u 

V 





a 

0 

0" 


r 

s 

f 


16. (a) 


w x 

.y z. 


a 0 
0 b 


(b) 


0 b 0 
0 0c 


u v 
x y 


In Exercises 18, create a symmetric matrix by substituting 
appropriate numbers for the x ’s. 


In Exercises 27-28, find all values of x for which A is invertible. 


x — 1 


27. A = 


0 

0 


x + 2 x 3 
0 x-4 


28. A = 



0 

0 


x + 


l 

4 


17. (a) 


2 

x 


-1 

3 



x 

x 

0 

9 


x 

X 

X 

0 


29. If A is an invertible upper triangular or lower triangular ma- 
trix, what can you say about the diagonal entries of A -1 ? 

30. Show that if A is a symmetric n x n matrix and B is any n x m 
matrix, then the following products are symmetric: 


7 -3 2~ 

4 5-7 

x 1 -6 

x x 3_ 

In Exercises 19-22, determine by inspection whether the ma- 
trix is invertible. 


b t b, bb t , b t ab 

In Exercises 31-32, find a diagonal matrix A that satisfies the 
given condition. 



"i 

0 

0" 


"9 

0 

O' 

II 

0 

-1 

0 

32. A~ 2 = 

0 

4 

0 


0 

0 

-1 


0 

0 

1 


18. (a) 


0 x 
3 0 


(b) 



"o 

6 

-l" 



'-1 

2 

4' 


19. 

0 

7 

-4 


20. 

0 

3 

0 
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0 

-2 



0 

0 

5 



"1 

0 

0 

0" 


2 

0 

0 

O' 


2 

-5 

0 

0 


-3 

-1 

0 

0 

21. 

4 

-3 

4 

0 

22. 

-4 

-6 

0 

0 


1 

-2 

1 

3 


0 

3 

8 

-5 


33. Verify Theorem 1.7. 1(6) for the matrix product AB and The- 
orem 1.7.1 (rf) for the matrix A, where 


'-1 

2 

5' 


'2 

-8 

o' 

0 

1 

3 

. B = 

0 

2 

1 

0 

0 

-4 


0 

0 

3 


34. Let A be an n x n symmetric matrix. 

(a) Show that A 2 is symmetric. 

(b) Show that 2A 2 — 3 A + / is symmetric. 
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35. Verify Theorem 1.7.4 for the given matrix A. 

1-2 3~ 

-2 1 -7 

3 -7 4_ 

36. Find all 3 x 3 diagonal matrices A that satisfy 
A 2 -3A-4I = 0. 

37. Let A = [fly] be an n x n matrix. Determine whether A is 
symmetric. 

(a) fly = i 2 + j 2 (b) fly = i 2 - j 2 

(c) aij=2i+2j (d) fly = 2/ 2 + 2;' 3 

38. On the basis of your experience with Exercise 37, devise a gen- 
eral test that can be applied to a formula for fly to determine 
whether A = [ay] is symmetric. 

39. Find an upper triangular matrix that satisfies 



40. If the n x n matrix A can be expressed as A = LU , where L is 
a lower triangular matrix and U is an upper triangular matrix, 
then the linear system Ax = b can be expressed as LUx = b 
and can be solved in two steps: 

Step 1. Let Ux = y, so that LUx = b can be expressed as 
Ly = b. Solve this system. 

Step 2. Solve the system Ux = y for x. 

In each part, use this two-step method to solve the given 
system. 



1 0 o" 


"2 -1 3 " 


Xl 


l" 

(a) 

-2 3 0 


0 1 2 


X2 

= 

-2 


2 4 1 


1 

o 

o 

1 


X} 


0 



1 

O 

O 

<N 

1 


"3 -5 2 


Xl 


4 ~ 

(b) 

4 1 0 


0 4 1 


x 2 

= 

-5 


-3 -2 3 


0 0 2 


x 3 


2 


In the text we defined a matrix A to be symmetric if A T = A. 
Analogously, a matrix A is said to be skew-symmetric if A T = — A. 
Exercises 41 — are concerned with matrices of this type. 

41. Fill in the missing entries (marked with x) so the matrix A is 
skew-symmetric. 


(a) A = 

X X 

0 x 

4 ~ 

X 

(b) A = 

"x 0 x" 

x x -4 


x -1 

X 


8 x x 


42. Find all values of a , b, c, and d for which A is skew-symmetric. 


43. We showed in the text that the product of symmetric matrices 
is symmetric if and only if the matrices commute. Is the prod- 
uct of commuting skew-symmetric matrices skew-symmetric? 
Explain. 

Working with Proofs 

44. Prove that every square matrix A can be expressed as the sum 
of a symmetric matrix and a skew-symmetric matrix. [Hint: 
Note the identity A = \ (A + A T ) + \ (A — A r ).] 

45. Prove the following facts about skew-symmetric matrices. 

(a) If A is an invertible skew-symmetric matrix, then A^ 1 is 
skew-symmetric. 

(b) If A and B are skew-symmetric matrices, then so are A r , 
A + B, A — B, and kA for any scalar k. 

46. Prove: If the matrices A and B are both upper triangular or 
both lower triangular, then the diagonal entries of both AB 
and BA are the products of the diagonal entries of A and B. 

47. Prove: If A T A = A, then A is symmetric and A = A 2 . 

True-False Exercises 

TF. In parts (a)-(m) determine whether the statement is true or 

false, and justify your answer. 

(a) The transpose of a diagonal matrix is a diagonal matrix. 

(b) The transpose of an upper triangular matrix is an upper tri- 
angular matrix. 

(c) The sum of an upper triangular matrix and a lower triangular 
matrix is a diagonal matrix. 

(d) All entries of a symmetric matrix are determined by the entries 
occurring on and above the main diagonal. 

(e) All entries of an upper triangular matrix are determined by 
the entries occurring on and above the main diagonal. 

(f ) The inverse of an invertible lower triangular matrix is an upper 
triangular matrix. 

(g) A diagonal matrix is invertible if and only if all of its diagonal 
entries are positive. 

(h) The sum of a diagonal matrix and a lower triangular matrix is 
a lower triangular matrix. 

(i) A matrix that is both symmetric and upper triangular must be 
a diagonal matrix. 

(j) If A and B are n x n matrices such that A + B is symmetric, 
then A and B are symmetric. 

(k) If A and B are n x n matrices such that A + B is upper trian- 
gular, then A and B are upper triangular. 


(a) A = 


2 -1 
-1 3 


(b) A = 


A = 


0 

-2 

-3 


2a — 3b + c 
0 
-5 


3fl — 56 4- 5c 
5a — 86 + 6c 
d 


(l) If A 2 is a symmetric matrix, then A is a symmetric matrix. 

(m) If kA is a symmetric matrix for some k ^ 0, then A is a sym- 
metric matrix. 


1.8 Matrix Transformations 75 


Working withTechnology 

Tl. Starting with the formula stated in Exercise T1 of Section 1.5, 
derive a formula for the inverse of the “block diagonal” matrix 

"D, 0 ' 

0 D 2 


in which D t and D 2 are invertible, and use your result to compute 
the inverse of the matrix 


‘1.24 2.37 

3.08 -1.01 

0 0 

0 0 


0 0 - 

0 0 

2.76 4.92 

3.23 5.54_ 


1.8 MatrixTransformations 

In this section we will introduce a special class of functions that arise from matrix 
multiplication. Such functions, called “matrix transformations,” are fundamental in the 
study of linear algebra and have important applications in physics, engineering, social 
sciences, and various branches of mathematics. 


Recall that in Section 1.1 we defined an “ordered n-tuple” to be a sequence of n real 
numbers, and we observed that a solution of a linear system in n unknowns, say 


The term “vector” is used in 
various ways in mathemat- 
ics, physics, engineering, and 
other applications. The idea 
of viewing n -tuples as vectors 
will be discussed in more detail 
in Chapter 3, at which point we 
will also explain how this idea 
relates to more familiar notion 
of a vector. 


X\ — X2 — S 2 , • . . , X n — S n 


can be expressed as the ordered n -tuple 

Oi, s 2 , ■ ■ ■ , s„) (1) 


Recall also that if n = 2, then the //-tuple is called an “ordered pair,” and if n = 3, it is 
called an “ordered triple.” For two ordered /(-tuples to be regarded as the same, they 
must list the same numbers in the same order. Thus, for example, (1, 2) and (2, 1) are 
different ordered pairs. 

The set of all ordered n-tuples of real numbers is denoted by the symbol R n . The 
elements of R” are called vectors and are denoted in boldface type, such as a, b, v, w, 
and x. When convenient, ordered n-tuples can be denoted in matrix notation as column 
vectors. For example, the matrix 

- «r 


S2 


( 2 ) 


U«J 

can be used as an alternative to (1). We call (1) the comma-delimited form of a vector 
and (2) the column-vector form. For each i = 1,2, , n, let e, denote the vector in R n 
with a 1 in the ith position and zeros elsewhere. In column form these vectors are 



T 


'O' 


'o' 


0 


1 


0 

ei = 

0 

, e 2 = 

0 

5 • • • ? Cft 

0 


0 


0 


1 


We call the vectors ei, e 2 , . . . , e„ the standard basis vectors for R" . 
vectors 



"f 


'o' 


'o' 

ei = 

0 

, e 2 = 

1 

. e 3 = 

0 


0 


0 


1 


are the standard basis vectors for R \ 


For example, the 
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Functions and 
Transformations 


f 



b =f(a) 


Domain Codomain 

A B 

A Figure 1.8.1 


Matrix Transformations 

It is common in linear algebra 
to use the letter T to denote 
a transformation. In keeping 
with this usage, we will usually 
denote a transformation from 
R" to R m by writing 

T:R n ^R m 


The vectors ei, e 2 , . . . , e„ in R" are termed “basis vectors” because all other vectors 
in R" are expressible in exactly one way as a linear combination of them. For example, 


*2 


X 


«_ 


then we can express x as 

x = xid + x 2 e 2 H b x„e n 


Recall that a Junction is a rule that associates with each element of a set A one and only 
one element in a set B. If / associates the element b with the element a, then we write 

b = f(a) 

and we say that b is the image of a under / or that f(a ) is the value of / at a. The set 
A is called the domain of / and the set B the codomain of / (Figure 1.8.1 ). The subset 
of the codomain that consists of all images of elements in the domain is called the range 
off. 

In many applications the domain and codomain of a function are sets of real numbers, 
but in this text we will be concerned with functions for which the domain is R" and the 
codomain is R m for some positive integers m and n. 


DEFINITION 1 If / is a function with domain R n and codomain R m , then we say that 
/ is a transformation from R" to R m or that f maps from R" to R m , which we denote 
by writing 

/: R n — »■ R m 

In the special case where m = n, a transformation is sometimes called an operator on 
R". 


In this section we will be concerned with the class of transformations from R n to R' n 
that arise from linear systems. Specifically, suppose that we have the system of linear 
equations 

w i = a n x x + a\ 2 x 2 + ■ ■ ■ + ci\ n x n 


w 2 = a 2 \X\ + a 22 x 2 


0 2 nXn 


(3) 


— a m i x i ~F a m2 x 2 ~F • • ■ T ci mn x n 
which we can write in matrix notation as 


W2 

_ 

an ai 2 

a 2 i a 22 

@ln 

&2n 


Xi 

x 2 

W m 


Cm 1 Cl m2 

&mn 


x n 


(4) 


or more briefly as 


w = Ax (5) 

Although we could view (5) as a compact way of writing linear system (3), we will view 
it instead as a transformation that maps a vector x in R" into thevector w in R m by 
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R n 


R m 


T-.R R m 

A 

▲ Figure 1.8.2 


multiplying x on the left by A. We call this a matrix transformation (or matrix operator 
in the special case where m — n). We denote it by 

T a :R" -* R m 


(see Figure 1.8.2). This notation is useful when it is important to make the domain 
and codomain clear. The subscript on T A serves as a reminder that the transformation 
results from multiplying vectors in R n by the matrix A. In situations where specifying 
the domain and codomain is not essential, we will express (4) as 

w =T a {x) (6) 


We call the transformation T A multiplication by A . On occasion we will find it convenient 
to express (6) in the schematic form 


x 


Ta 


w 


( 7 ) 


which is read “T A maps x into w.” 


► EXAMPLE 1 A Matrix Transformation from R 4 to R 3 

The transformation from R 4 to R 3 defined by the equations 

w\ = 2x\ — 3x2 + x-} — 5x4 
W2 = 4X| + X2 — 2 X 3 + X 4 
w 3 = 5xi — X 2 + 4x3 
can be expressed in matrix form as 


( 8 ) 


w 1 

w 2 

U>3 


Xl 

X2 

X 3 

X4 


A = 


( 9 ) 


from which we see that the transformation can be interpreted as multiplication by 

"2 -3 1 -5" 

4 1-21 

5-1 4 0_ 

Although the image under the transformation T A of any vector 

Xl 

*2 
x 3 

X4 

in R 4 could be computed directly from the defining equations in (8), we will find it 
preferable to use the matrix in (9). For example, if 

f 
-3 
0 
2 


then it follows from (9) that 


Wl 

lt>2 

W 3 


= T a (x) = Ax = 


-3 1 

1 -2 
-1 4 


-5 

1 

0 
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Properties of Matrix 
Transformations 


► EXAMPLE 2 Zero Transformations 

If 0 is the m x n zero matrix, then 


T 0 (x) = Ox = 0 

so multiplication by zero maps every vector in R" into the zero vector in R m . We call To 
the zero transformation from R" to R m . 


► EXAMPLE 3 Identity Operators 

If I is the it x n identity matrix, then 

7j(x) = 7x = x 

so multiplication by I maps every vector in R" to itself. We call Tj the identity operator 
on R n . 


The following theorem lists four basic properties of matrix transformations that follow 
from properties of matrix multiplication. 


1 For every matrix A the matrix transformation T a '■ R 11 — > R m has the 
following properties for all vectors u and v and for every scalar k: 

(a) T a ( 0) = 0 

( b ) T A (ku) = kT A (yf) [Homogeneity property) 

(c) Ta (u + v) = T a (u) + T a (v) [Additivity property) 

(d) T a (u-v) = T a (u)~ T a (v) 


Proof All four parts are restatements of the following properties of matrix arithmetic 
given in Theorem 1.4.1: 

AO = 0, A(fcu) = k(Au), A(u + v) = Au + Av, A(u — v) = Au — Av 

It follows from parts (b) and (c) of Theorem 1.8.1 that a matrix transformation maps 
a linear combination of vectors in R n into the corresponding linear combination of 
vectors in R m in the sense that 


7a(*!Ui + k 2 u 2 H b k r u r ) = fciT^Ui) + k 2 T A ( u 2 ) H b k r T A ( u r ) (10) 


Matrix transformations are not the only kinds of transformations. For example, if 

w i = x\ + x 2 

W 2 = X]X 2 


(ID 


then there are no constants a , b, c, and d for which 


m 


a b 


Xl 


x\ + x\ 

W 2 


c d 


x 2 


X\X2 


so that the equations in (11) do not define a matrix transformation from R 2 to R 2 . 
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This leads us to the following two questions. 


Question Are there algebraic properties of a transformation T: R" — »• R'” that can 
be used to determine whether T is a matrix transformation? 

Question 2. If we discover that a transformation T : R" — »■ R m is a matrix transfor- 
mation, how can we find a matrix for it? 


The following theorem and its proof will provide the answers. 


!EM 1.8.2 T : R" — > R m is a matrix transformation if and only if the following 
relationships hold for all vectors u and v in R n and for every scalar k : 

(i) T(u + y) = T (u) + T (v) | Additivity property] 

(ii) T (ku) = kT (u) [Homogeneity property] 


Proof If T is a matrix transformation, then properties (i) and (ii) follow respectively 
from parts (c) and (b) of Theorem 1.8.1. 

Conversely, assume that properties (i) and (ii) hold. We must show that there exists 
an m x n matrix A such that 

T (x) = Ax 

for every vector x in R n . Recall that the derivation of Formula (10) used only the 
additivity and homogeneity properties of T A . Since we are assuming that T has those 
properties, it must be true that 


T (Ayui + k 2 u 2 + ■ • • + k r ii r ) — k\ T(ui) + k 2 T (ih) + • • ■ + k r T (u,.) (12) 

for all scalars k\ , k 2 , . . . , k r and all vectors ui, U 2 , . . . , u r in R n . Let A be the matrix 

A = [T (e0 | T (e 2 ) | | T (e„)] (13) 

where e \ , e 2 , . . . , e„ are the standard basis vectors for R" . It follows from Theorem 1.3.1 
that Ax is a linear combination of the columns of A in which the successive coefficients 
are the entries x\, x 2 , . . . , x n of x. That is, 

Ax = xiT(ei) + x 2 T(e 2 ) H h x n T(e n ) 

Using Formula (10) we can rewrite this as 


Ax = T (x i e i + x 2 e 2 H b x n e n ) = T (x) 

which completes the proof. 

The additivity and homogeneity properties in Theorem 1.8.2 are called linearity 
conditions, and a transformation that satisfies these conditions is called a linear transfor- 
mation. Using this terminology Theorem 1.8.2 can be restated as follows. 

Theorem 1.8.3 tells us that 
for transformations from R n to 
R"\ the terms “matrix trans- 
formation” and “linear trans- 
formation” are synonymous. 


THEOREM 1.8.3 Every linear transformation from R' 1 to R m is a matrix transformation, 
and conversely, every matrix transformation from R" to R"' is a linear transformation. 



80 


Systems of Linear Equations and Matrices 


A Procedure for Finding 
Standard Matrices 


Depending on whether n -tuples and m -tuples are regarded as vectors or points, the 
geometric effect of a matrix transformation Ta : R n — »■ R ln is to map each vector (point) 
in R n into a vector (point) in R m (Figure 1.8.3). 



► Figure 1.8.3 


T a maps vectors to vectors. 


T a maps points to points. 


The following theorem states that if two matrix transformations from R" to R m have 
the same image at each point of R n , then the matrices themselves must be the same. 


IfTA'. /?"—»■ R' n and Tb'. R" — > R m are matrix transformations, and if 
T a (x) = 7 b(x) for every vector x in R n , then A = B. 


Proof To say that T A (x) = T B (x) for every vector in R" is the same as saying that 

Ax = Bx 

for every vector x in R n . This will be true, in particular, if x is any of the standard basis 
vectors ei, e 2 , . . . , e„ for R " ; that is, 

Aej — Bej (j = 1, 2, . . . , ri) (14) 

Since every entry of e ; is 0 except for the jlh. which is 1, it follows from Theorem 1.3.1 
that Aej is the y'th column of A and Bej is the jth column of B. Thus, (14) implies that 
corresponding columns of A and B are the same, and hence that A = B. 

Theorem 1.8.4 is significant because it tells us that there is a one-to-one correspondence 
between m x n matrices and matrix transformations from R" to R m in the sense that 
every m x n matrix A produces exactly one matrix transformation (multiplication by A) 
and every matrix transformation from R" to R m arises from exactly one m x n matrix; 
we call that matrix the standard matrix for the transformation. 


In the course of proving Theorem 1.8.2 we showed in Formula (13) that if ei, e 2 , . . . , e„ 
are the standard basis vectors for R” (in column form), then the standard matrix for a 
linear transformation T: R n — > R m is given by the formula 

A = [T(e0 | T(e 2 ) | • • • | T(e„)] (15) 

This suggests the following procedure for finding standard matrices. 


Finding the Standard Matrix for a Matrix Transformation 

Step 1. Find the images of the standard basis vectors ei, e 2 , . . . , e„ for R" . 

Step 2. Construct the matrix that has the images obtained in Step 1 as its successive 
columns. This matrix is the standard matrix for the transformation. 
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Although we could have ob- 
tained the result in Example 5 
by substituting values for the 
variables in (13), the method 
used in Example 5 is preferable 
for large-scale problems in that 
matrix multiplication is better 
suited for computer computa- 
tions. 


► EXAMPLE 4 Finding a Standard Matrix 

Find the standard matrix A for the linear transformation T: R 2 — ► R 2 defined by the 
formula 


T 


x\ 

x 2 


2x\ + x 2 
Xl - 3x 2 

—X\ + X 2 


(16) 


Solution We leave it for you to verify that 


T(e i) = T 



2 

1 

-1 


and 


T(e 2 ) = T 



1 

-3 

1 


Thus, it follows from Formulas (15) and (16) that the standard matrix is 


A = [T (ej) | T (e 2 )] = 


2 

1 

-1 


1 

3 

1 


► EXAMPLE 5 Computing with Standard Matrices 

For the linear transformation in Example 4, use the standard matrix A obtained in that 
example to find 



Solution The transformation is multiplication by A, so 



For transformation problems posed in comma-delimited form, a good procedure is 
to rewrite the problem in column-vector form and use the methods previously illustrated. 


► EXAMPLE 6 Finding a Standard Matrix 

Rewrite the transformation T (xi, x 2 ) = (3xi + x 2 , 2xi — 4x 2 ) in column- vector form 
and find its standard matrix. 


Solution 


( 

Xl 


3 xi + x 2 


'3 l" 

Xl 

l 

x 2 

r 

2 xi — 4 x 2 


2 -2 

x 2 


Thus, the standard matrix is 

"3 1 

2 -2 


Remark This section is but a first step in the study of linear transformations, which is one of the 
major themes in this text. We will delve deeper into this topic in Chapter 4, at which point we will 
have more background and a richer source of examples to work with. 


82 Systems of Linear Equations and Matrices 


Exercise Set 1.8 


In Exercises 1-2, find the domain and codomain of the trans- 
formation T a (x) = Ax. 


1. (a) A has size 3x2. 
(c) A has size 3x3. 

2. (a) A has size 4x5. 
(c) A has size 4x4. 


(b) A has size 2x3. 
(d) A has size 1 x 6 . 

(b) A has size 5x4. 
(d) A has size 3x1. 


In Exercises 3-4, find the domain and codomain of the trans- 
formation defined by the equations. 


3. (a) wi = 4jci + 5xt 
W2 = Xi — 8 x 2 


(b) w i = 5xi — 7 x 2 

W 2 = 6xi + X2 

w 3 = 2xi + 3 x 2 


4. (a) w i = X\ — 4x2 + 8 x 3 (b) w\ — 2x\ + lx 2 — 4 x 3 

iv 2 — — Xi -f 4x'2 A 2 x 3 uh — 4xi — 3 x 2 4“ 2 x 3 
u ?3 — — 3xi 4“ 2 x 2 — 5x3 

In Exercises - 6 , find the domain and codomain of the trans- 
formation defined by the matrix product. 


5. (a) 


6. (a) 


3 1 2 

6 7 1 


6 

3" 

Xl 

-1 

7 

x 2 


(b) 


(b) 


2 -1 
4 3 
2 -5 


1 -6 
7 -4 
0 3 


In Exercises - 8 , find the domain and codomain of the trans- 
formation T defined by the formula. 

7. (a) T (x! , x 2 ) = (2xi — x 2 , Xi 4- x 2 ) 

(b) T (xi , x 2 , X3) = (4xi 4- x 2 , xi 4- x 2 ) 

8 . (a) T (xi , x 2 , X3, X4) = (xi , x 2 ) 

(b) T(x i,x 2 , x 3 ) = (xi, x 2 - X3, x 2 ) 

In Exercises 9-10, find the domain and codomain of the trans- 
formation T defined by the formula. 


9. T 


In Exercises 1 12, find the standard matrix for the transfor- 
mation defined by the equations. 


Xi 

\ 

4xi 

10. T 

( 

X] 

1 

Xi 

X 2 

= 

Xi — x 2 


x 2 

= 

X 2 

/ 

3x 2 





Xl — X 3 


/ 



_X 3 _ 

/ 

0 


11. (a) w 1 = 2xi — 3 x 2 4- X 3 
W 2 = 3xi 4- 5x 2 — X 3 


(b) w 1 = 7xi 4- 2 x 2 — 8 x 3 
W 2 = — X 2 4- 5.X3 

w 3 = 4.t| 4“ 7 x 2 — X 3 


12. (a) uy = —X] 4- x 2 
W 2 = 3xi — 2 x 2 
W } = 5xi — 1 X 2 


(b) w, = xi 

u >2 = X 1 4- x 2 

u > 3 = Xi 4- x 2 4- x 3 

W 4 = X\ 4~ x 2 4~ X 3 4- X 4 


13. Find the standard matrix for the transformation T defined by 
the formula. 

(a) T (xi , x 2 ) = (x 2 , — xi , xi 4- 3x 2 , xy — x 2 ) 

(b) 7’(x 1 ,x 2 ,X3,x 4 ) = (7x[ 4- 2 x 2 — X 3 4 - x 4 , x 2 4 - X 3 , —X \ ) 

(c) T(xi,x 2 ,X3) = (0,0,0, 0,0) 

(d) r(Xi,X2,X 3 ,X 4 ) = (X4, Xi, X3, X 2 , Xi — X 3 ) 

14. Find the standard matrix for the operator T defined by the 
formula. 

(a) T (xi , x 2 ) = (2xi — x 2 , X\ 4- x 2 ) 

(b) T(x i,x 2 ) = (xi,x 2 ) 

(c) T (xi , x 2 , X 3 ) = (xi 4- 2 x 2 4- x 3 , X\ 4- 5x 2 , X 3 ) 

(d) r(xi,x 2 ,x 3 ) = (4xi, 7 x 2 , - 8 x 3 ) 

15. Find the standard matrix for the operator T: R 3 — > R 3 defined 
by 

W\ = 3xi 4- 5 x 2 — X3 
ic 2 4xi — x 2 4“ X 3 
?/.■ 3 = 3 x ] 4~ 2 x 2 — X3 

and then compute T(— 1,2,4) by directly substituting in the 
equations and then by matrix multiplication. 

16. Find the standard matrix for the transformation T: R 4 -*- R 2 
defined by 

Wi = 2xi 4- 3 x 2 — 5x3 — X4 
ui 2 = x\ — 5 x 2 4- 2 x 3 — 3 x 4 

and then compute T( 1, — 1. 2, 4) by directly substituting in 
the equations and then by matrix multiplication. 

In Exercises 17-18, find the standard matrix for the transfor- 
mation and use it to compute T (x). Check your result by substi- 
tuting directly in the formula for T. 

17. (a) T(x i,x 2 ) = (— xi 4 -x 2 ,x 2 ); x= (—1,4) 

(b) T{x 1 , x 2 , X 3 ) = (2xi — x 2 4- x 3 , x 2 4- x 3 , 0); 
x = (2, 1,-3) 

18. (a) T (xi , x 2 ) = (2xi — x 2 , xi 4- x 2 ); x = (—2, 2) 

(b) T{x 1 , x 2 , x 3 ) = (xi, x 2 — x 3 , x 2 ); x = (1, 0, 5) 

In Exercises 19-20, find T A (x), and express your answer in 
matrix form. 


19. (a) A = 


(b) A = 


1 2 
3 4 


-1 2 O' 

3 1 5 


3' 

-2 


-1 

1 

3 
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20. (a) A = 


-2 

3 

6 


1 

4 


Xi 

5 

7 

; x = 

X 2 

0 

-1 


X 3 


(b) A = 


-1 

2 

7 


1 

4 

8 


x = 


Xi 

x 2 


In Exercises 21-22, use Theorem 1.8.2 to show that T is a 
matrix transformation. 

21. (a) T(x,y ) = (2x + y, x - y) 

(b) T(xi,x 2 , x 3 ) = (*i, x 3 , xi + x 2 ) 


22. (a) T(x, y, z) = (x + y, y + z,x) 
(b) T(x u x 2 ) = (x 2 ,Xi) 


30. We proved in the text that if T.R " — >• R'" is a matrix transfor- 
mation, then T (0) = 0. Show that the converse of this result 
is false by finding a mapping T: R n — > R m that is not a matrix 
transformation but for which T (0) = 0. 


31. Let T a : R 3 — »• R 3 be multiplication by 


A = 


-1 

2 

4 


3 0 
1 2 
5 -3 


and let ei, e 2 , ande 3 be the standard basis vectors for R 3 . Find 
the following vectors by inspection. 

(a) T A (ei), r A (e 2 ), and T A (e 3 ) 


(b) T A (ei + e 2 + e 3 ) (c) T A (7e 3 ) 


In Exercises 2 3-24, use Theorem 1.8.2 to show that T is not a 
matrix transformation. 

23. (a) T(x, y) = (x 2 , y) 

(b) T{x , y , z) = (x, y, xz) 

24. (a) T(x, y) = (x, y + 1) 

(b) T (xi , x 2 , x 3 ) = (xi , x 2 , <Jx^) 


Working with Proofs 

32. (a) Prove: If T: R" — > R' n is a matrix transformation, then 
T (0) = 0; that is, T maps the zero vector in R" into the 
zero vector in R m . 

(b) The converse of this is not true. Find an example of a 
function T for which T(0) = 0 but which is not a matrix 
transformation. 


25. A function of the form f(x) = mx + b is commonly called a 
“linear function” because the graph of y = mx + b is a line. 
Is / a matrix transformation on R1 

26. Show that T(x, y) = (0, 0) defines a matrix operator on R 2 
but T(x, y) = (1, 1) does not. 

In Exercises 27-28, the images of the standard basis vec- 
tors for R 3 are given for a linear transformation T:R 3 ->-R 3 . 
Find the standard matrix for the transformation, and find 
T(x). 



T 


'o' 




4' 


'2' 

27. r(e,) = 

3 

. T(e 2 ) = 

0 

, r(e 3 ) = 


-3 

; x = 

1 


0 


1 




-1 


0 


'2' 



3' 



'l' 


'3' 

28. T( ei ) = 

1 

, T (e 2 ) = 



, T (e 3 ) = 


0 

; x = 

2 


3 


0 



2 


1 


29. Let T: R 2 —>■ R 2 be a linear operator for which the images 
of the standard basis vectors for R 2 are T(ei) = (a, b) and 
r(e 2 ) = (c,d). Find T (1, 1). 


True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 

false, and justify your answer. 

(a) If A is a 2 x 3 matrix, then the domain of the transformation 
T a is R 2 . 

(b) If A is an m x n matrix, then the codomain of the transfor- 
mation T a is R n . 

(c) There is at least one linear transformation T: R" —>■ R'" for 
which T (2x) = 4 T (x) for some vector x in R" . 

(d) There are linear transformations from R" to R m that are not 
matrix transformations. 

(e) If T a : R " — »• R " and if T A (x) = 0 for every vector x in R" , then 
A is the n x n zero matrix. 

(f ) There is only one matrix transformation T : R n — > R' n such that 
T (— x) = —T(x) for every vector x in R n . 

(g) If b is a nonzero vector in R ” , then T (x) = x + b is a matrix 
operator on R n . 
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1.9 

Applications of Linear Systems 


In this section we will discuss some brief applications of linear systems. These are but a 
small sample of the wide variety of real-world problems to which our study of linear 
systems is applicable. 

Network Analysis 

The concept of a network appears in a variety of applications. Loosely stated, a network 
is a set of branches through which something “flows.” For example, the branches might 
be electrical wires through which electricity flows, pipes through which water or oil flows, 
traffic lanes through which vehicular traffic flows, or economic linkages through which 
money flows, to name a few possibilities. 

In most networks, the branches meet at points, called nodes or junctions, where the 
flow divides. For example, in an electrical network, nodes occur where three or more wires 
join, in a traffic network they occur at street intersections, and in a financial network 
they occur at banking centers where incoming money is distributed to individuals or 
other institutions. 

In the study of networks, there is generally some numerical measure of the rate at 
which the medium flows through a branch. For example, the flow rate of electricity is 
often measured in amperes, the flow rate of water or oil in gallons per minute, the flow rate 
of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros 
per day. We will restrict our attention to networks in which there is flow conservation at 
each node, by which we mean that the rate of flow into any node is equal to the rate of flow 
out of that node. This ensures that the flow medium does not build up at the nodes and 
block the free movement of the medium through the network. 

A common problem in network analysis is to use known flow rates in certain branches 
to find the flow rates in all of the branches. Here is an example. 

30 

EXAMPLE 1 Network Analysis Using Linear Systems 

Figure 1.9.1 shows a network with four nodes in which the flow rate and direction of 
flow in certain branches are known. Find the flow rates and directions of flow in the 
remaining branches. 

35 — — >- 55 

Solution As illustrated in Figure 1.9.2, we have assigned arbitrary directions to the 

X \ Xl5 

unknown flow rates x\, x 2 , and x 2 . We need not be concerned if some of the directions 
are incorrect, since an incorrect direction will be signaled by a negative value for the flow 
rate when we solve for the unknowns. 

60 

▲ Figure 1.9.1 

It follows from the conservation of flow at node A that 

x\+ X 2 = 30 

30 

Similarly, at the other nodes we have 

x 2 + x 3 = 35 (node B ) 


*3 + 15 = 60 (node C) 

35 — 

xi + 15 = 55 (node D) 

V\r/15 

These four conditions produce the linear system 

60 

▲ Figure 1.9.2 

X\ + x 2 = 30 

x 2 + *3 = 35 

xj = 45 

xi = 40 
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which we can now try to solve for the unknown flow rates. In this particular case the 
system is sufficiently simple that it can be solved by inspection (work from the bottom 
up). We leave it for you to confirm that the solution is 

x\ = 40, X 2 — —10, x 3 = 45 

The fact that X 2 is negative tells us that the direction assigned to that flow in Figure 1 .9.2 
is incorrect; that is, the flow in that branch is into node A. 


► EXAMPLE 2 Design ofTraffic Patterns 

The network in Figure 1.9.3 shows a proposed plan for the traffic flow around a new 
park that will house the Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a 
computerized traffic light at the north exit on Fifth Street, and the diagram indicates the 
average number of vehicles per hour that are expected to flow in and out of the streets 
that border the complex. All streets are one-way. 

(a) How many vehicles per hour should the traffic light let through to ensure that the 
average number of vehicles per hour flowing into the complex is the same as the 
average number of vehicles flowing out? 

(b) Assuming that the traffic light has been set to balance the total flow in and out of 
the complex, what can you say about the average number of vehicles per hour that 
will flow along the streets that border the complex? 


► Figure 1.9.3 


N 

W<^>E 

s 


Market St. 
500- > 


200 

Y 

co 


700- 


Liberty 

Park 


Chestnut St. 


Traffic 

light 


-400 

-400 


600 


(«) 


200 



400 

400 


Solution (a) If, as indicated in Figure 1 .93b, we let x denote the number of vehicles per 
hour that the traffic light must let through, then the total number of vehicles per hour 
that flow in and out of the complex will be 

Flowing in: 500 + 400 + 600 + 200 = 1700 
Flowing out: x + 700 + 400 

Equating the flows in and out shows that the traffic light should let .r = 600 vehicles per 
hour pass through. 

Solution [b] To avoid traffic congestion, the flow in must equal the flow out at each 
intersection. For this to happen, the following conditions must be satisfied: 


Intersection 

Flow In 

Flow Out 

A 

400 + 600 = 

X\ + X 2 

B 

X 2 + X3 = 

400 + x 

C 

500 + 200 = 

x 3 + x 4 

D 

Xi +x 4 = 

700 
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Thus, with x = 600, as computed in part (a), we obtain the following linear system: 

X\ + X 2 = 1000 

X 2 + x 3 = 1000 

X3 + X4 — 700 

x\ + X4 — 700 

We leave it for you to show that the system has infinitely many solutions and that these 
are given by the parametric equations 

Xi = 700 — f, X 2 — 300 + t, X3 = 700 — t, xs, — t (1) 

However, the parameter t is not completely arbitrary here, since there are physical con- 
straints to be considered. For example, the average flow rates must be nonnegative since 
we have assumed the streets to be one-way, and a negative flow rate would indicate a flow 
in the wrong direction. This being the case, we see from ( 1 ) that t can be any real number 
that satisfies 0 < t < 700, which implies that the average flow rates along the streets will 
fall in the ranges 

0 < x\ < 700, 300 < X 2 < 1000, 0 < X3 < 700, 0 < X4 < 700 


Electrical Circuits 



Switch \ ' ' 

VA 

▲ Figure 1.9.4 


Next we will show how network analysis can be used to analyze electrical circuits con- 
sisting of batteries and resistors. A battery is a source of electric energy, and a resistor , 
such as a lightbulb, is an element that dissipates electric energy. Figure 1.9.4 shows a 
schematic diagram of a circuit with one battery (represented by the symbol -|f ), one 
resistor (represented by the symbol -av^), and a switch. The battery has a positive pole 
(+) and a negative pole (— ). When the switch is closed, electrical current is considered to 
flow from the positive pole of the battery, through the resistor, and back to the negative 
pole (indicated by the arrowhead in the figure). 

Electrical current, which is a flow of electrons through wires, behaves much like the 
flow of water through pipes. A battery acts like a pump that creates “electrical pressure” 
to increase the flow rate of electrons, and a resistor acts like a restriction in a pipe that 
reduces the flow rate of electrons. The technical term for electrical pressure is electrical 
potential ; it is commonly measured in volts (V). The degree to which a resistor reduces the 
electrical potential is called its resistance and is commonly measured in ohms (£2). The 
rate of flow of electrons in a wire is called current and is commonly measured in amperes 
(also called amps) (A). The precise effect of a resistor is given by the following law: 



▲ Figure 1.9.5 


Ohm's Law If a current of I amperes passes through a resistor with a resistance of 
R ohms, then there is a resulting drop of E volts in electrical potential that is the 
product of the current and resistance; that is, 

E = IR 


A typical electrical network will have multiple batteries and resistors joined by some 
configuration of wires. A point at which three or more wires in a network are joined is 
called a node (or junction point). A branch is a wire connecting two nodes, and a closed 
loop is a succession of connected branches that begin and end at the same node. For 
example, the electrical network in Figure 1.9.5 has two nodes and three closed loops — 
two inner loops and one outer loop. As current flows through an electrical network, it 
undergoes increases and decreases in electrical potential, called voltage rises and voltage 
drops , respectively. The behavior of the current at the nodes and around closed loops is 
governed by two fundamental laws: 
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Kirchhoff's Current Law The sum of the currents flowing into any node is equal to the 
sum of the currents flowing out. 



A Figure 1.9.6 
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A Figure 1.9.7 


Kirchhoff's Voltage Law In one traversal of any closed loop, the sum of the voltage 
rises equals the sum of the voltage drops. 


Kirchhoff’s current law is a restatement of the principle of flow conservation at a node 
that was stated for general networks. Thus, for example, the currents at the top node in 
Figure 1.9.6 satisfy the equation I\ = I 2 + h- 

In circuits with multiple loops and batteries there is usually no way to tell in advance 
which way the currents are flowing, so the usual procedure in circuit analysis is to as- 
sign arbitrary directions to the current flows in the branches and let the mathematical 
computations determine whether the assignments are correct. In addition to assigning 
directions to the current flows, Kirchhoff’s voltage law requires a direction of travel for 
each closed loop. The choice is arbitrary, but for consistency we will always take this 
direction to be clockwise (Figure 1.9. 7). We also make the following conventions: 

A voltage drop occurs at a resistor if the direction assigned to the current through the 
resistor is the same as the direction assigned to the loop, and a voltage rise occurs at 
a resistor if the direction assigned to the current through the resistor is the opposite 
to that assigned to the loop. 

A voltage rise occurs at a battery if the direction assigned to the loop is from — to + 
through the battery, and a voltage drop occurs at a battery if the direction assigned 
to the loop is from + to — through the battery. 

If you follow these conventions when calculating currents, then those currents whose 
directions were assigned correctly will have positive values and those whose directions 
were assigned incorrectly will have negative values. 


I 



A Figure 1.9.8 


► EXAMPLE 3 A Circuit with One Closed Loop 

Determine the current I in the circuit shown in Figure 1.9.8. 

Solution Since the direction assigned to the current through the resistor is the same 
as the direction of the loop, there is a voltage drop at the resistor. By Ohm’s law this 
voltage drop is E = IR = 31. Also, since the direction assigned to the loop is from — 
to + through the battery, there is a voltage rise of 6 volts at the battery. Thus, it follows 
from Kirchhoff’s voltage law that 

31 — 6 

from which we conclude that the current is / = 2 A. Since I is positive, the direction 
assigned to the current flow is correct. 



A Figure 1.9.9 


EXAMPLE 4 A Circuit with Three Closed Loops 

Determine the currents I\, I 2 , and I 2 in the circuit shown in Figure 1.9.9. 

Solution Using the assigned directions for the currents, Kirchhoff s current law provides 
one equation for each node: 

Node Current In Current Out 

A I\ + I 2 = h 

B h = h+h 
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However, these equations are really the same, since both can be expressed as 

h + h ~ h — 0 ( 2 ) 

To find unique values for the currents we will need two more equations, which we will 
obtain from Kirchhoff’s voltage law. We can see from the network diagram that there 
are three closed loops, a left inner loop containing the 50 V battery, a right inner loop 
containing the 30 V battery, and an outer loop that contains both batteries. Thus, 
Kirchhoff’s voltage law will actually produce three equations. With a clockwise traversal 
of the loops, the voltage rises and drops in these loops are as follows: 

Voltage Rises Voltage Drops 

Left Inside Loop 50 5/i + 207 3 

Right Inside Loop 3 O+IO /2 + 2 O /3 0 

Outside Loop 30 + 50 + IO /2 57, 

These conditions can be rewritten as 

57, + 20/3 = 50 

10 / 2 + 20/3 = -30 (3) 

57, - 107 2 = 80 

However, the last equation is superfluous, since it is the difference of the first two. Thus, 

if we combine (2) and the first two equations in (3), we obtain the following linear system 
of three equations in the three unknown currents: 

7, + L — h = 0 

57, + 207 3 = 50 

107 2 + 207 3 = -30 

We leave it for you to show that the solution of this system in amps is 7, = 6, 1 2 = —5, 
and7 3 = 1. The fact that 7 2 is negative tells us that the direction of this current is opposite 
to that indicated in Figure 1.9.9. 

Balancing Chemical 
Equations 


Chemical compounds are represented by chemical formulas that describe the atomic 
makeup of their molecules. For example, water is composed of two hydrogen atoms and 
one oxygen atom, so its chemical formula is H 2 0; and stable oxygen is composed of two 
oxygen atoms, so its chemical formula is CL. 

When chemical compounds are combined under the right conditions, the atoms in 
their molecules rearrange to form new compounds. For example, when methane burns, 



Gustav Kirchhoff 
( 1824 - 1887 ) 


Historical Note The Ger- 
man physicist Gustav Kirch- 
hoff was a student of Gauss. 
His work on Kirchhoff's laws, 
announced in 1854, was a 
major advance in the calcu- 
lation of currents, voltages, 
and resistances of electri- 
cal circuits. Kirchhoff was 
severely disabled and spent 
most of his life on crutches 
or in a wheelchair. 

[Image: ullstein bild - 
histopics/akg-im] 
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the methane (CH 4 ) and stable oxygen (0 2 ) react to form carbon dioxide (CO,) and water 
(H 2 0). This is indicated by the chemical equation 

CH 4 + O, — + C0 2 + H 2 0 (4) 

The molecules to the left of the arrow are called the reactants and those to the right 
the products. In this equation the plus signs serve to separate the molecules and are 
not intended as algebraic operations. However, this equation does not tell the whole 
story, since it fails to account for the proportions of molecules required for a complete 
reaction (no reactants left over). For example, we can see from the right side of (4) that 
to produce one molecule of carbon dioxide and one molecule of water, one needs three 
oxygen atoms for each carbon atom. However, from the left side of (4) we see that one 
molecule of methane and one molecule of stable oxygen have only two oxygen atoms 
for each carbon atom. Thus, on the reactant side the ratio of methane to stable oxygen 
cannot be one-to-one in a complete reaction. 

A chemical equation is said to be balanced if for each type of atom in the reaction, 
the same number of atoms appears on each side of the arrow. For example, the balanced 
version of Equation (4) is 

CH 4 + 20? — * C0 2 + 2H 2 0 (5) 

by which we mean that one methane molecule combines with two stable oxygen molecules 
to produce one carbon dioxide molecule and two water molecules. In theory, one could 
multiply this equation through by any positive integer. For example, multiplying through 
by 2 yields the balanced chemical equation 

2CH 4 + 40, — * 2CO, + 4H,0 

However, the standard convention is to use the smallest positive integers that will balance 
the equation. 

Equation (4) is sufficiently simple that it could have been balanced by trial and error, 
but for more complicated chemical equations we will need a systematic method. There 
are various methods that can be used, but we will give one that uses systems of linear 
equations. To illustrate the method let us reexamine Equation (4). To balance this 
equation we must find positive integers, x\ , x 2 , X 3 , and x 4 such that 

X\ (CH 4 ) + Xi (0 2 ) — > A ' 3 (CO,) + x 4 (H,0) ( 6 ) 

For each of the atoms in the equation, the number of atoms on the left must be equal to 
the number of atoms on the right. Expressing this in tabular form we have 

Left Side Right Side 
Carbon x\ — X 3 

Hydrogen 4x\ — 2a' 4 

Oxygen 2x 2 = 2^3 + M 

from which we obtain the homogeneous linear system 

x\ — xi =0 

4xi — 2 x 4 = 0 

2 x 2 — 2*3 — x 4 = 0 

The augmented matrix for this system is 

"l 0-1 0 o" 

400-20 
0 2 - 2-1 0 
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We leave it for you to show that the reduced row echelon form of this matrix is 

"I 0 0 0" 

010-10 
_0 0 1 0_ 

from which we conclude that the general solution of the system is 
x\ = t /2, X2 — t, x 3 = t /2, xn — t 

where t is arbitrary. The smallest positive integer values for the unknowns occur when 
we let t — 2, so the equation can be balanced by letting x\ — 1 . X2 = 2, x 3 = 1 , X4 = 2. 
This agrees with our earlier conclusions, since substituting these values into Equation (6) 
yields Equation ( 5 ). 


IS- EXAMPLE 5 Balancing Chemical Equations Using Linear Systems 

Balance the chemical equation 

HC 1 + Na 3 P 0 4 — * H3PO4 + NaCl 

[hydrochloric acid] + [sodium phosphate] > [phosphoric acid] + [sodium chloride| 

Solution Let x \ , X2, x 3 , and X4 be positive integers that balance the equation 

X\ (HC 1 ) + x 2 (Na 3 P 0 4 ) — * x 3 (H 3 P 0 4 ) + x 4 (NaCl) ( 7 ) 


Equating the number of atoms of each type on the two sides yields 


lxi = 3 x 3 
lxi = lx 4 
3 x '2 = lx 4 
lx 2 = lx 3 
4x 2 = 4x 3 


Hydrogen (H) 
Chlorine (Cl) 
Sodium (Na) 
Phosphorus (P) 
Oxygen (O) 


from which we obtain the homogeneous linear system 


xi — 3 x 3 = 0 

xi — x 4 = 0 

3 x 2 — x 4 = 0 

Xi — x 3 = 0 

4x2 — 4 x 3 = 0 


We leave it for you to show that the reduced row echelon form of the augmented matrix 
for this system is 

”l 0 0-1 O’ 

0 1 0 -} 0 

0 0 1 -} 0 

0 0 0 0 0 

0 0 0 0 0 
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from which we conclude that the general solution of the system is 
X\ — t, x 2 = t/ 3, X3 = f/3, X 4 = t 

where t is arbitrary. To obtain the smallest positive integers that balance the equation, 
we let t = 3, in which case we obtain x\ — 3, x 2 = 1, *3 = 1, and x 4 = 3. Substituting 
these values in (7) produces the balanced equation 

3HC1 + Na 3 P0 4 — > H 3 P0 4 + 3NaCl ◄ 


Polynomial Interpolation An important problem in various applications is to find a polynomial whose graph passes 

through a specified set of points in the plane; this is called an interpolating polynomial 
for the points. The simplest example of such a problem is to find a linear polynomial 

p(x) — ax + b (8) 



▲ Figure 1.9.10 


whose graph passes through two known distinct points, (x l5 yj) and (x 2 , y 2 ), in the 
xy-plane (Figure 1.9.10). You have probably encountered various methods in analytic 
geometry for finding the equation of a line through two points, but here we will give a 
method based on linear systems that can be adapted to general polynomial interpolation. 

The graph of (8) is the line y — ax + b, and for this line to pass through the points 
(x\ , j'i) and (x 2 , y 2 ), we must have 

yi = ax 1 + b and y 2 — ax 2 + b 


Therefore, the unknown coefficients a and b can be obtained by solving the linear system 

ax\ + b = yi 
ax 2 + b = y 2 


We don’t need any fancy methods to solve this system — the value of a can be obtained 
by subtracting the equations to eliminate b, and then the value of a can be substituted 
into either equation to find b. We leave it as an exercise for you to find a and b and then 
show that they can be expressed in the form 


yi - yi 


and b = 


y \x 2 - y2*i 


( 9 ) 


X 2 — X\ x 2 — A'l 

provided xi ^ x 2 . Thus, for example, the line y = ax + b that passes through the points 



(2, 1) 

can be obtained by taking (x \ , yi) = (2, 
4- 1 

= 1 and 


5-2 

Therefore, the equation of the line is 


and (5,4) 

1) and (x 2 , y 2 ) = (5, 4), in which case (9) yields 
, ( 1)(5) - (4) (2) 


y = x — 1 


(Figure 1.9.1 1). 

Now let us consider the more general problem of finding a polynomial whose graph 
passes through n points with distinct -coordinates 


(xi,yi), (x 2 ,y 2 ), (x 3 , y 3 ),..., (x„,y„) (10) 


Since there are n conditions to be satisfied, intuition suggests that we should begin by 
looking for a polynomial of the form 

p{x) — «o + flix + a 2 x 2 + ■ ■ ■ + a n -\x n ~ x 


(ID 
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since a polynomial of this form has n coefficients that are at our disposal to satisfy the 
n conditions. However, we want to allow for cases where the points may lie on a line or 
have some other configuration that would make it possible to use a polynomial whose 
degree is less than n — 1 ; thus, we allow for the possibility that fl„_i and other coefficients 
in ( 1 1 ) may be zero. 

The following theorem, which we will prove later in the text, is the basic result on 
polynomial interpolation. 


Polynomial Interpolation 

Given any n points in the xy-plane that have distinct x-coordinates, there is a unique 
polynomial of degree n — 1 or less whose graph passes through those points. 


Let us now consider how we might go about finding the interpolating polynomial 
(11) whose graph passes through the points in (10). Since the graph of this polynomial 
is the graph of the equation 


y — T a\x T a 2 X^ ■ ■ ■ T a n —\x n ^ (12) 

it follows that the coordinates of the points must satisfy 


ao + cqxi T d2X^ ~b ■ ' 

■ • + fl, 1 

= >1 


ao + a\X2 + 02 X 2 + • ■ 

■ ■ + a, ,-i.r” 

= yi 

(13) 

ao + a\x n + a 2 x 3 + ■ ■ 

■ • + 

= y n 



In these equations the values of jc’s and y’s are assumed to be known, so we can view 
this as a linear system in the unknowns a 0 , a\, . . . , a„_i. From this point of view the 
augmented matrix for the system is 


Xi 



>1 


9 

n— 1 


*2 

*2 •' 

‘ * x 2 

yi 

x„ 

r 2 .. 
n 

y.n—1 

x n 

y n 


(14) 


and hence the interpolating polynomial can be found by reducing this matrix to reduced 
row echelon form (Gauss- Jordan elimination). 


► EXAMPLE 6 Polynomial Interpolation by Gauss-Jordan Elimination 

Find a cubic polynomial whose graph passes through the points 

(1,3), (2,-2), (3,-5), (4,0) 

Solution Since there are four points, we will use an interpolating polynomial of degree 
n = 3. Denote this polynomial by 

p(x) — ao + a\x + a^x 1 + a^x 3 

and denote the x- and y-coordinates of the given points by 

xi = 1, X 2 — 2, X 3 — 3, X 4 = 4 and y\ = 3, y 2 — —2, y^ = —5, y 4 = 0 
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▲ Figure 1.9.12 


Thus, it follows from (14) that the augmented matrix for the linear system in the un- 
knowns flo, a \ , ci 2 , and a 3 is 


1 

Xi 

A 

x 3 

y 1 


'1 

1 

1 

1 

3 

1 

*2 

A 

A 

yi 


1 

2 

4 

8 

-2 

1 

X3 

x\ 

x 3 

V 3 


1 

3 

9 

27 

-5 

1 


2 

3 



1 

4 

16 

64 

0 

X 4 

A 

X 4 

3T 








We leave it for you to confirm that the reduced row echelon form of this matrix is 


1 0 0 0 4 

0 10 0 3 

0 0 10-5 

0 0 0 1 1 

from which it follows that ao = 4, a\ = 3, <22 = —5, 03 = 1. Thus, the interpolating 
polynomial is 

p(x) — 4 + 3x — 5x 2 + x 3 

The graph of this polynomial and the given points are shown in Figure 1.9.12. 


Remark Later we will give a more efficient method for finding interpolating polynomials that is 
better suited for problems in which the number of data points is large. 


► EXAMPLE 7 Approximate Integration 

There is no way to evaluate the integral 



directly since there is no way to express an antiderivative of the integrand in terms of 
elementary functions. This integral could be approximated by Simpson’s rule or some 
comparable method, but an alternative approach is to approximate the integrand by an 
interpolating polynomial and integrate the approximating polynomial. For example, let 
us consider the five points 


Xq = 0, Xi = 0.25, X2 = 0.5, X3 = 0.75, x 4 = 1 



▲ Figure 1.9.13 


that divide the interval [0, 1] into four equally spaced subintervals (Figure 1.9.13). The 
values of 

f(x) = sin 

at these points are approximately 

/(0) = 0, /(0.25) = 0.098017, /( 0.5) = 0.382683, 

/(0.75) = 0.77301. /(I) = 1 

The interpolating polynomial is (verify) 

pix) = 0.098796x + 0.762356x 2 + 2.14429x 3 - 2.00544x 4 (15) 


and 


f 


p{x)dx ^ 0.438501 


(16) 


As shown in Figure 1.9.13, the graphs of / and p match very closely over the interval 
[0, 1], so the approximation is quite good. 
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Exercise Set 1.9 


1. The accompanying figure shows a network in which the flow 
rate and direction of flow in certain branches are known. Find 
the flow rates and directions of flow in the remaining branches. 


50 



◄ Figure Ex-1 


(b) Solve the system for the unknown flow rates. 

(c) Is it possible to close the road from A to B for construction 
and keep traffic flowing on the other streets? Explain. 



2. The accompanying figure shows known flow rates of hydro- 
carbons into and out of a network of pipes at an oil refinery. 

(a) Set up a linear system whose solution provides the un- 
known flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) Find the flow rates and directions of flow if jc 4 = 50 and 
x 6 = 0. 


200 


> /l50 




25 . 



► 200 


◄ Figure Ex-2 


3. The accompanying figure shows a network of one-way streets 
with traffic flowing in the directions indicated. The flow rates 
along the streets are measured as the average number of vehi- 
cles per hour. 

(a) Set up a linear system whose solution provides the un- 
known flow rates. 

(b) Solve the system for the unknown flow rates. 

(c) If the flow along the road from A to B must be reduced for 
construction, what is the minimum flow that is required to 
keep traffic flowing on all roads? 



◄ Figure Ex-3 


4. The accompanying figure shows a network of one-way streets 
with traffic flowing in the directions indicated. The flow rates 
along the streets are measured as the average number of vehi- 
cles per hour. 

(a) Set up a linear system whose solution provides the un- 
known flow rates. 


In Exercises 5-8, analyze the given electrical circuits by finding 
the unknown currents. 

5. 8 V 




20 Q 
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In Exercises M2, write a balanced equation for the given 

chemical reaction. 

9. C 3 H 8 + O 2 — *■ CO 2 + HiO (propane combustion) 

10. C 6 H 12 O 6 — > CO 2 + C 2 H 5 OH (fermentation of sugar) 

11. CH 3 COF + H 2 0 -)• CH 3 COOH + HF 

12. CO 2 + FFO — ► C 6 H 12 O 6 + O 2 (photosynthesis) 

13. Find the quadratic polynomial whose graph passes through 
the points (1, 1), (2, 2), and (3, 5). 

14. Find the quadratic polynomial whose graph passes through 
the points ( 0 , 0 ), (—1. 1 ), and ( 1 , 1). 

15. Find the cubic polynomial whose graph passes through the 
points (-1. -1), (0, 1), (1, 3), (4, -1). 

16. The accompanying figure shows the graph of a cubic polyno- 
mial. Find the polynomial. 



1 2 3 4 5 6 7 8 

17. (a) Find an equation that represents the family of all second- 

degree polynomials that pass through the points ( 0 . 1 ) 
and (1,2). [Hint: The equation will involve one arbi- 
trary parameter that produces the members of the family 
when varied.] 

(b) By hand, or with the help of a graphing utility, sketch 
four curves in the family. 

18. In this section we have selected only a few applications of lin- 
ear systems. Using the Internet as a search tool, try to find 
some more real-world applications of such systems. Select one 
that is of interest to you, and write a paragraph about it. 

True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

(a) In any network, the sum of the flows out of a node must equal 
the sum of the flows into a node. 


(b) When a current passes through a resistor, there is an increase 
in the electrical potential in a circuit. 

(c) Kirchhoff’s current law states that the sum of the currents 
flowing into a node equals the sum of the currents flowing out 
of the node. 

(d) A chemical equation is called balanced if the total number of 
atoms on each side of the equation is the same. 

(e) Given any n points in the xy-plane, there is a unique polyno- 
mial of degree n — 1 or less whose graph passes through those 
points. 

Working with Technology 

Tl. The following table shows the lifting force on an aircraft wing 
measured in a wind tunnel at various wind velocities. Model the 
data with an interpolating polynomial of degree 5, and use that 
polynomial to estimate the lifting force at 2000 ft/s. 


Velocity 
(100 ft/s) 

1 

2 

4 

8 

16 

32 

Lifting Force 
(100 lb) 

0 

3.12 

15.86 

33.7 

81.5 

123.0 


T2. ( Calculus required ) Use the method of Example 7 to approx- 
imate the integral 

f e x dx 

Jo 

by subdividing the interval of integration into five equal parts and 
using an interpolating polynomial to approximate the integrand. 
Compare your answer to that obtained using the numerical inte- 
gration capability of your technology utility. 

T3. Use the method of Example 5 to balance the chemical equa- 
tion 

Fe 2 03 T A1 — > A U O 3 T Fe 
(Fe = iron, A1 = aluminum, O = oxygen) 

T4. Determine the currents in the accompanying circuit. 



+ 1 1 - 

12 V 


MV 

2 n 


96 Systems of Linear Equations and Matrices 


1.10 


Inputs and Outputs in an 
Economy 


Manufacturing Agriculture 



Utilities 

▲ Figure 1.10.1 


Leontief Model of an Open 
Economy 


Leontief Input-Output Models 

In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on 
economic modeling in which he used matrix methods to study the relationships among 
different sectors in an economy. In this section we will discuss some of the ideas developed 
by Leontief. 

One way to analyze an economy is to divide it into sectors and study how the sectors 
interact with one another. For example, a simple economy might be divided into three 
sectors — manufacturing, agriculture, and utilities. Typically, a sector will produce cer- 
tain outputs but will require inputs from the other sectors and itself. For example, the 
agricultural sector may produce wheat as an output but will require inputs of farm ma- 
chinery from the manufacturing sector, electrical power from the utilities sector, and food 
from its own sector to feed its workers. Thus, we can imagine an economy to be a net- 
work in which inputs and outputs flow in and out of the sectors; the study of such flows 
is called input-output analysis. Inputs and outputs are commonly measured in monetary 
units (dollars or millions of dollars, for example) but other units of measurement are 
also possible. 

The flows between sectors of a real economy are not always obvious. For example, 
in World War II the United States had a demand for 50,000 new airplanes that required 
the construction of many new aluminum manufacturing plants. This produced an unex- 
pectedly large demand for certain copper electrical components, which in turn produced 
a copper shortage. The problem was eventually resolved by using silver borrowed from 
Fort Knox as a copper substitute. In all likelihood modern input-output analysis would 
have anticipated the copper shortage. 

Most sectors of an economy will produce outputs, but there may exist sectors that 
consume outputs without producing anything themselves (the consumer market, for 
example). Those sectors that do not produce outputs are called open sectors. Economies 
with no open sectors are called closed economies, and economies with one or more open 
sectors are called open economies (F igure 1.10.1). In this section we will be concerned with 
economies with one open sector, and our primary goal will be to determine the output 
levels that are required for the productive sectors to sustain themselves and satisfy the 
demand of the open sector. 

Let us consider a simple open economy with one open sector and three product-producing 
sectors: manufacturing, agriculture, and utilities. Assume that inputs and outputs are 
measured in dollars and that the inputs required by the productive sectors to produce 
one dollar’s worth of output are in accordance with Table 1. 



Wassily Leontief 
( 1906 - 1999 ) 


It is somewhat ironic that it was 
the Russian-born Wassily Leontief who won the No- 
bel prize in 1973 for pioneering the modern meth- 
ods for analyzing free-market economies. Leontief 
was a precocious student who entered the University 
of Leningrad at age 15. Bothered by the intellectual 
restrictions of the Soviet system, he was put in jail 
for anti-Communist activities, after which he headed 
for the University of Berlin, receiving his Ph.D. there 
in 1928. He came to the United States in 1931, where 
he held professorships at Harvard and then NewYork 
University. 

[Image: © Bettmarm/CORBIS] 
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Table 1 


Input Required per Dollar Output 



Manufacturing 

Agriculture 

Utilities 

Manufacturing 

$0.50 

$0.10 

$0.10 

Agriculture 

$0.20 

$0.50 

$0.30 

Utilities 

$0.10 

$0.30 

$0.40 


Usually, one would suppress the labeling and express this matrix as 



'0.5 

0.1 

o.r 

c = 

0.2 

0.5 

0.3 


. 0.1 

0.3 

0.4_ 


This is called the consumption matrix (or sometimes the technology matrix) for the econ- 
omy. The column vectors 


"0.5' 


"o.r 


"0.1" 

0.2 

, c 2 = 

0.5 

, C 3 = 

0.3 

_0.1_ 


.0.3. 


.0.4. 


What is the economic signifi- 
cance of the row sums of the 
consumption matrix? 


in C list the inputs required by the manufacturing, agricultural, and utilities sectors, 
respectively, to produce $1.00 worth of output. These are called the consumption vectors 
of the sectors. For example, Ci tells us that to produce $1.00 worth of output the manu- 
facturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural 
output, and $0.10 worth of utilities output. 

Continuing with the above example, suppose that the open sector wants the economy 
to supply it manufactured goods, agricultural products, and utilities with dollar values: 


d\ dollars of manufactured goods 
d 2 dollars of agricultural products 
d 2 dollars of utilities 


The column vector d that has these numbers as successive components is called the outside 
demand vector. Since the product-producing sectors consume some of their own output, 
the dollar value of their output must cover their own needs plus the outside demand. 
Suppose that the dollar values required to do this are 

xi dollars of manufactured goods 
xi dollars of agricultural products 
X3 dollars of utilities 


The column vector x that has these numbers as successive components is called the 
production vector for the economy. For the economy with consumption matrix ( 1 ), that 
portion of the production vector x that will be consumed by the three productive sectors is 



"o.5 _ 


‘o.r 


‘o.r 


"o.5 

0.1 

0.l‘ 


Xl 


0.2 

+ %2 

0.5 

+ Xj 

0.3 

= 

0.2 

0.5 

0.3 


x 2 


0.1 


0.3 


0.4 


0.1 

0.3 

0.4 


Xl 


Fractions 


Fractions 


Fractions 

consumed by 


consumed by 


consumed 

manufacturing 


agriculture 


by utilities 
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Productive Open 
Economies 


The vector Cx is called the intermediate demand vector for the economy. Once the 
intermediate demand is met, the portion of the production that is left to satisfy the 
outside demand is x — Cx. Thus, if the outside demand vector is d, then x must satisfy 
the equation 


X 

— 

Cx 

= 

d 

Amount 


Intermediate 


Outside 

produced 


demand 


demand 


which we will find convenient to rewrite as 

(7 - C)x = d 


( 2 ) 


The matrix 7 — C is called the Leontief matrix and (2) is called the Leontief equation. 

► EXAMPLE 1 Satisfying Outside Demand 

Consider the economy described in Table 1 . Suppose that the open sector has a demand 
for $7900 worth of manufacturing products, $3950 worth of agricultural products, and 
$1975 worth of utilities. 

(a) Can the economy meet this demand? 

(b) If so, find a production vector x that will meet it exactly. 

Solution The consumption matrix, production vector, and outside demand vector are 


( 3 ) 


To meet the outside demand, the vector x must satisfy the Leontief equation (2), so the 
problem reduces to solving the linear system 



~0.5 

0.1 

o.r 


X\ 


~7900~ 

c = 

0.2 

0.5 

0.3 

, X = 

x 2 

, d = 

3950 


0.1 

0.3 

0.4 


x 3 


1975 


0.5 -0.1 -0.f 


X\ 


~7900~ 

-0.2 0.5 -0.3 


x 2 

= 

3950 

-0.1 -0.3 0.6 


X 2 


1975 


( 4 ) 


C 


(if consistent). We leave it for you to show that the reduced row echelon form of the 
augmented matrix for this system is 


"l 

0 

0 

27,500“ 

0 

1 

0 

33,750 

0 

0 

1 

24,750 


This tells us that (4) is consistent, and the economy can satisfy the demand of the open 
sector exactly by producing $27,500 worth of manufacturing output, $33,750 worth of 
agricultural output, and $24,750 worth of utilities output. 


In the preceding discussion we considered an open economy with three product-producing 
sectors; the same ideas apply to an open economy with n product-producing sectors. In 
this case, the consumption matrix, production vector, and outside demand vector have 
the form 



Cll 

C\2 

C\n 


X\ 


~d x ~ 

C = 

Cll 

c 22 

C2n 

, X = 

x 2 

, d = 

d,2 


_Cn\ 

Cfi2 

wm_ 


_%n _ 


i 

i 
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where all entries are nonnegative and 

Cjj — the monetary value of the output of the z'th sector that is needed by the y'th 
sector to produce one unit of output 

Xi — the monetary value of the output of the z'th sector 

<7, = the monetary value of the output of the z'th sector that is required to meet 
the demand of the open sector 


Remark Note that the yth column vector of C contains the monetary values that the j th sector 
requires of the other sectors to produce one monetary unit of output, and the z' th row vector of C 
contains the monetary values required of the z' th sector by the other sectors for each of them to 
produce one monetary unit of output. 


As discussed in our example above, a production vector x that meets the demand d 
of the outside sector must satisfy the Leontief equation 

(7 — C)x = d 

If the matrix 7 — C is invertible, then this equation has the unique solution 

x = (/ - C)- 1 d (5) 

for every demand vector d. However, for x to be a valid production vector it must 
have nonnegative entries, so the problem of importance in economics is to determine 
conditions under which the Leontief equation has a solution with nonnegative entries. 

It is evident from the form of (5) that if I — C is invertible, and if (7 — C) -1 has non- 
negative entries, then for every demand vector d the corresponding x will also have non- 
negative entries, and hence will be a valid production vector for the economy. Economies 
for which (7 — C)“ ! has nonnegative entries are said to be productive. Such economies 
are desirable because demand can always be met by some level of production. The follow- 
ing theorem, whose proof can be found in many books on economics, gives conditions 
under which open economies are productive. 


If C is the consumption matrix for an open economy, and if all of 
the column sums are less than 1, then the matrix I — C is invertible, the entries of 
(I — C) _1 are nonnegative, and the economy is productive. 


Remark The yth column sum of C represents the total dollar value of input that the jth sector 
requires to produce $1 of output, so if the jth column sum is less than 1, then the j th sector 
requires less than $1 of input to produce $1 of output; in this case we say that the yth sector is 
profitable. Thus, Theorem 1.10.1 states that if all product-producing sectors of an open economy 
are profitable, then the economy is productive. In the exercises we will ask you to show that an 
open economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open 
economy is productive if either all of the column sums or all of the row sums of C are less than 1 . 


► EXAMPLE 2 An Open Economy Whose Sectors Are All Profitable 

The column sums of the consumption matrix C in (1) are less than 1, so (7 — C) -1 exists 
and has nonnegative entries. Use a calculating utility to confirm this, and use this inverse 
to solve Equation (4) in Example 1 . 
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Solution We leave it for you to show that 


(/ - C )- 1 


" 2.65823 

1.89873 

1.39241 


1.13924 

3.67089 

2.02532 


1 . 01266 " 

2.15190 

2.91139 


This matrix has nonnegative entries, and 


" 2.65823 1.13924 1 . 01266 " 


" 7900 " 


" 27 , 500 " 

1.89873 3.67089 2.15190 


3950 


33,750 

_ 1. 39241 2.02532 2 . 91 139 _ 


_ 1975 _ 


_ 24 , 750 _ 


which is consistent with the solution in Example 1 . 


Exercise Set 1.10 

1. An automobile mechanic ( M ) and a body shop (B) use each 
other’s services. For each $1.00 of business that M does, it 
uses $0.50 of its own services and $0.25 of B’ s services, and 
for each $1.00 of business that B does it uses $0.10 of its own 
services and $0.25 of M’s services. 

(a) Construct a consumption matrix for this economy. 

(b) How much must M and B each produce to provide cus- 
tomers with $7000 worth of mechanical work and $14,000 
worth of body work? 

2. A simple economy produces food ( F ) and housing (H). The 
production of $1.00 worth of food requires $0.30 worth of 
food and $0.10 worth of housing, and the production of $ 1 .00 
worth of housing requires $0.20 worth of food and $0.60 worth 
of housing. 

(a) Construct a consumption matrix for this economy. 

(b) What dollar value of food and housing must be produced 
for the economy to provide consumers $130,000 worth of 
food and $130,000 worth of housing? 

3. Consider the open economy described by the accompanying 
table, where the input is in dollars needed for $1.00 of output. 

(a) Find the consumption matrix for the economy. 

(b) Suppose that the open sector has a demand for $1930 
worth of housing, $3860 worth of food, and $5790 worth 
of utilities. Use row reduction to find a production vector 
that will meet this demand exactly. 

Table Ex-3 


Input Required per Dollar Output 



Housing 

Food 

Utilities 

Housing 

$0.10 

$0.60 

$0.40 

Food 

$0.30 

$0.20 

$0.30 

Utilities 

$0.40 

$0.10 

$0.20 


4. A company produces Web design, software, and networking 
services. View the company as an open economy described by 
the accompanying table, where input is in dollars needed for 
$1.00 of output. 

(a) Find the consumption matrix for the company. 

(b) Suppose that the customers (the open sector) have a de- 
mand for $5400 worth of Web design, $2700 worth of soft- 
ware, and $900 worth of networking. Use row reduction 
to find a production vector that will meet this demand 
exactly. 

Table Ex-4 


Input Required per Dollar Output 



Web Design 

Software 

Networking 

Web Design 

$0.40 

$0.20 

$0.45 

Software 

$0.30 

$0.35 

$0.30 

Networking 

$0.15 

$0.10 

$0.20 


In Exercises 5-6, use matrix inversion to find the production 
vector x that meets the demand d for the consumption matrix C. 




'0.1 

0.3' 


'50" 

5. 

C = 

.0.5 

0.4. 

; d = 

. 6 °. 



'0.3 

0.T 


22" 

6. 

C = 



; d = 


.0.3 

0.7. 


14 


7. Consider an open economy with consumption matrix 



(a) Show that the economy can meet a demand of d\ =2 units 
from the first sector and d 2 = 0 units from the second sec- 
tor, but it cannot meet a demand of d l =2 units from the 
first sector and d 2 = 1 unit from the second sector. 

(b) Give both a mathematical and an economic explanation 
of the result in part (a). 
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8. Consider an open economy with consumption matrix 


True-False Exercises 


”1 I i" 

2 4 4 


III 

- 2 4 8 _ 

If the open sector demands the same dollar value from each 
product-producing sector, which such sector must produce the 
greatest dollar value to meet the demand? Is the economy pro- 
ductive? 

9. Consider an open economy with consumption matrix 


C = 


Cll 

- c 21 


Cl2 

0 


Show that the Leontief equation x — Cx = d has a unique 
solution for every demand vector d if C 21 G 2 < 1 — Cn. 


Working with Proofs 

10. (a) Consider an open economy with a consumption matrix 

C whose column sums are less than 1, and let x be the 
production vector that satisfies an outside demand d; that 
is, (/ — C)“'d = x. Let d ; be the demand vector that is 
obtained by increasing the j th entry of d by 1 and leaving 
the other entries fixed. Prove that the production vector 
x j that meets this demand is 

Xj = x + j th column vector of (I — C) -1 

(b) In words, what is the economic significance of the j th col- 
umn vector of (/ — C) _1 ? [Hint: Look at xj — x.] 

11. Prove: If C is an n x n matrix whose entries are nonnegative 
and whose row sums are less than 1, then I — C is invertible 
and has nonnegative entries. [Hint: {A T )~ l = (A -1 ) r forany 
invertible matrix A.] 


TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

(a) Sectors of an economy that produce outputs are called open 
sectors. 

(b) A closed economy is an economy that has no open sectors. 

(c) The rows of a consumption matrix represent the outputs in a 
sector of an economy. 

(d) If the column sums of the consumption matrix are all less than 
1, then the Leontief matrix is invertible. 

(e) The Leontief equation relates the production vector for an 
economy to the outside demand vector. 

Working with Technology 

Tl. The following table describes an open economy with three sec- 
tors in which the table entries are the dollar inputs required to pro- 
duce one dollar of output. The outside demand during a 1-week 
period if $50,000 of coal, $75,000 of electricity, and $1,250,000 
of manufacturing. Determine whether the economy can meet the 
demand. 


Input Required per Dollar Output 



Electricity 

Coal 

Manufacturing 

Electricity 

$ 0.1 

$0.25 

$ 0.2 

Coal 

$ 0.3 

$0.4 

$ 0.5 

Manufacturing 

$ 0.1 

$0.15 

$ 0.1 


Supplementary Exercises 


In Exercises 1-4 the given matrix represents an augmented 
matrix for a linear system. Write the corresponding set of linear 
equations for the system, and use Gaussian elimination to solve 
the linear system. Introduce free parameters as necessary. 









1 

4 

-\ 

"3 

-1 

0 


4 

l" 

2. 

-2 

-8 

2 

2 

0 

3 


3 

-1 


3 

12 

-3 








0 

0 

0_ 

2 

-4 


1 

6“ 



3 

1 

-2" 

-4 

0 


3 

-1 


4. 

-9 

-3 

6 

0 

1 


1 

3 



6 

2 

1 


5. 


Use Gauss-Jordan elimination to solve for x' and y' in terms 
of x and y. 


y = U + !/ 


6. Use Gauss-Jordan elimination to solve for x' and y' in terms 
of x and y. 

x = x' cos 9 — y' sin 6 
y = x' sin 9 + y' cos 9 


7. Find positive integers that satisfy 


3 . 


x + y + z = 9 
x T 5y T lOz = 44 
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8. A box containing pennies, nickels, and dimes has 1 3 coins with 15. Find values of a , b , and c such that the graph of the polyno- 
a total value of 83 cents. How many coins of each type are in mial p(x) — ax 1 + bx + c passes through the points (1, 2), 
the box? Is the economy productive? (—1, 6), and (2, 3). 


9. Let 

a 0 b 2 

a a 4 4 

0 a 2 b 

be the augmented matrix for a linear system. Find for what 
values of a and b the system has 

(a) a unique solution. 

(b) a one-parameter solution. 

(c) a two-parameter solution, (d) no solution. 

10. For which value(s) of a does the following system have zero 
solutions? One solution? Infinitely many solutions? 


16. ( Calculus required ) Find values of a. b. and c such that 
the graph of p(x) = ax 2 + bx + c passes through the point 
(—1,0) and has a horizontal tangent at (2, —9). 

17. Let J„ be the n x n matrix each of whose entries is 1 . Show 
that if n > 1 , then 

(/ -/„)-'=/ l —J n 

n — 1 

18. Show that if a square matrix A satisfies 

A 3 + 4A 2 - 2A + 11 = 0 

then so does A T . 


X\ + X 2 + X3 = 4 

X'3 = 2 

(a 1 — 4).V3 = a — 2 

11. Find a matrix K such that AKB = C given that 


19. Prove: If B is invertible, then AB 1 = B 'A if and only if 
AB = BA. 

20. Prove: If A is invertible, then A + B and I + BA~ l are both 
invertible or both not invertible. 



21. Prove: If A is an m x n matrix and B is the n x 1 matrix each 
of whose entries is 1 / «, then 


r i 


AB - 


>'2 


r 


/n_ 


12. How should the coefficients a, b , and c be chosen so that the 
system 

ax + by — 3z = — 3 
—2x — by + cz = — 1 
ax + 3y — cz = — 3 

has the solution x = 1, y = — 1, and z = 2? 

13. In each part, solve the matrix equation for X. 


(a) X 

(b) X 

(c) 


-1 

1 

3 

1 

3 


0 1 
1 0 
1 -1 

2 

1 


1 2 0 


3 1 

-1 2 


X - X 


1 4 

2 0 


-2 

4 


14. Let A be a square matrix. 

(a) Show that (I - A) -1 = / + A + A 2 + A 3 if A 4 = 0. 

(b) Show that 

(/ - A) -1 = / + A + A 2 + -- --|-A" 
if A ,,+l = 0. 


where r,- is the average of the entries in the ith row of A. 

22. ( Calculus required) If the entries of the matrix 


C = 


are differentiable functions of x, then we define 


Cn(x) 

Cn(x) ■ 

■ C ln (x) 

C21 (.X) 

C 2 l(x) 

■ C2n(x) 

_ C m \ (x) 

dm2 ( r ) 

dmn (x)_ 


-3 1 5 


~C n (x) 

c' n {x) ■ 

■ d' ln (x)- 


dC 

c' 2l (x) 

c' 22 (x) ■ 

■ d’ 2n (x) 

-5 -1 o’ 

dx 




6-3 7 


.<4iW 

d' m2 M ' 

■ d’ mn (x)_ 


Show that if the entries in A and B are differentiable func- 
tions of x and the sizes of the matrices are such that the stated 
operations can be performed, then 

d dA 

(a) —(kA) = k— 

dx dx 

d dA dB 

(b) —(A + B) = — f —— 

dx dx dx 

d dA dB 

(c) — (AB) = —BAA — 

dx dx dx 
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23. ( Calculus required) Use part (c) of Exercise 22 to show that 

dA~ l . dA , 

— — = —A~ l —A -1 
dx dx 

State all the assumptions you make in obtaining this formula. 


(a) Confirm that the sizes of all matrices are such that the 
product AB can be obtained using Formula (*). 

(b) Confirm that the result obtained using Formula (*) agrees 
with that obtained using ordinary matrix multiplication. 


24. Assuming that the stated inverses exist, prove the following 

equalities. 

(a) (C- 1 + = C(C + D) ‘D 

(b) (I + CDr l C = C(/ + DC)~ l 

(c) (C + DD T Y l D = C~'D(I + D t C~'D )~> 

Partitioned matrices can be multiplied by the row-column rule 
just as if the matrix entries were numbers provided that the sizes 
of all matrices are such that the necessary operations can be per- 
formed. Thus, for example, if A is partitioned into a 2 x 2 matrix 
and B into a 2 x 1 matrix, then 


26. Suppose that an invertible matrix A is partitioned as 


Show that 


where 



A 12 
A 22 


hi 

B 12 

hi 

B 22 


Sll — (An — AitAji'Aii) *> ®12 — — ^11^12^22* 

Bn = — A 22 I A2i6n, B 2 2 = (A 2 2 — A 2 i Ajj 1 A 12)" 1 


1 

>> 

An 

1 

1 


1 

to 

1 

+ 

1 

>> 

A 2 2 

1 

tS 3 

1 


1 

to 

1 

CQ 

+ 


provided that the sizes are such that AB , the two sums, and the 
four products are all defined. 

25. Let A and B be the following partitioned matrices. 



'1 

0 

2 

1 

4 


A = 

4 

1 

0 

3 

-1 

= 


0 

-3 

4 

2 

-2 

L 


An 

A 2 i 


A 12 

A 2 2 


B = 


'3 

2 

4 


O' 

1 

-1 


0 3 
2 5 


B 2 


provided all the inverses in these formulas exist. 


27. In the special case where matrix A 2 i in Exercise 26 is zero, the 
matrix A simplifies to 


A = 


A., 

0 


A !2 

A 22 


which is said to be in block upper triangular form. Use the 
result of Exercise 26 to show that in this case 


A~ = 


^11 

0 


^11 ^12^22 


28. A linear system whose coefficient matrix has a pivot position 
in every row must be consistent. Explain why this must be so. 


29. What can you say about the consistency or inconsistency of 
a linear system of three equations in five unknowns whose 
coefficient matrix has three pivot columns? 
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Determinants 

CHAPTER CONTENTS 2 . Determinants by Cofactor Expansion 105 

2.2 Evaluating Determinants by Row Reduction 113 

2.3 Properties of Determinants; Cramer's Rule 118 

INTRODUCTION In this chapter we will study “determinants” or, more precisely, “determinant 

functions.” Unlike real-valued functions, such as f(x) = x 2 , that assign a real number 
to a real variable x, determinant functions assign a real number /(A) to a matrix 
variable A. Although determinants first arose in the context of solving systems of 
linear equations, they are rarely used for that purpose in real-world applications. While 
they can be useful for solving very small linear systems (say two or three unknowns), 
our main interest in them stems from the fact that they link together various concepts 
in linear algebra and provide a useful formula for the inverse of a matrix. 


2.1 Determinants by Cofactor Expansion 

In this section we will define the notion of a “determinant.” This will enable us to develop a 
specific formula for the inverse of an invertible matrix, whereas up to now we have had only 
a computational procedure for finding it. This, in turn, will eventually provide us with a 
formula for solutions of certain kinds of linear systems. 


Recall from Theorem 1.4.5 that the 2x2 matrix 


A = 


a 

c 


b 

d 


WARNING It is important to 
keep in mind that det(A) is a 
number , whereas A is a matrix. 


is invertible if and only if ad — be ^ 0 and that the expression ad — be is called the 
determinant of the matrix A. Recall also that this determinant is denoted by writing 


det(A) = ad — be or 


a 

c 


b 

d 


— ad — be 


( 1 ) 


and that the inverse of A can be expressed in terms of the determinant as 


A" 1 


1 

det(A) 




( 2 ) 


Minors and Co factors 


One of our main goals in this chapter is to obtain an analog of Formula (2) that is 
applicable to square matrices of all orders. For this purpose we will find it convenient 
to use subscripted entries when writing matrices or determinants. Thus, if we denote a 
2x2 matrix as 


A = 


an 

«21 


on 

a 22 


105 
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then the two equations in (1) take the form 


det(A) = 


a ii 
021 


a 12 
a 22 


— a U a 22 ~ a U a 2l 


(3) 


In situations where it is inconvenient to assign a name to the matrix, we can express this 
formula as 


det 


Oil 


012 


= # 11^22 — ^ 12^21 


(4) 


.021 O 22. 

There are various methods for defining determinants of higher-order square matrices. 
In this text, we will us an “inductive definition” by which we mean that the determinant 
of a square matrix of a given order will be defined in terms of determinants of square 
matrices of the next lower order. To start the process, let us define the determinant of a 
lxl matrix [an] as 


det [flu] = an 


(5) 


from which it follows that Formula (4) can be expressed as 


det 


a ii 
021 


012 

022 


= detfan] det[a 22 ] — det[ai 2 ] detail 


Now that we have established a starting point, we can define determinants of 3 x 3 
matrices in terms of determinants of 2 x 2 matrices, then determinants of 4 x 4 matrices 
in terms of determinants of 3 x 3 matrices, and so forth, ad infinitum. The following 
terminology and notation will help to make this inductive process more efficient. 


DEFINITION 1 If A is a square matrix, then the minor of entry a;j is denoted by /W, ; 
and is defined to be the determinant of the submatrix that remains after the i th row 
and j th column are deleted from A. The number (— \)' + i M-,j is denoted by C i; - and 
is called the cofactor of entry a\j . 


WARNING We have followed 
the standard convention of us- 
ing capital letters to denote 
minors and cofactors even 
though they are numbers, not 
matrices. 


► EXAMPLE 1 Finding Minors and Cofactors 


Let 


The minor of entry a\\ is 



1 -4 
5 6 
4 8 


M n = 



5 6 
4 8 


16 


The cofactor of an is 

C„ = (— l) 1+1 M„ = Mu = 16 


The term determinant was first introduced by the German mathematician Carl 
Friedrich Gauss in 1801 (see p. 15), who used them to "determine" properties of certain kinds of 
functions. Interestingly, the term matrix is derived from a Latin word for "womb" because it was 
viewed as a container of determinants. 
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Similarly, the minor of entry a 32 is 



The cofactor of a 32 is 

C32 = (— 1) 3+2 M 3 2 = — M 32 = —26 


Remark Note that a minor My and its corresponding cofactor Cy are either the same or negatives 
of each other and that the relating sign (— 1)' +J is either +1 or —1 in accordance with the pattern 
in the “checkerboard” array 

"+ - + - + ■■■' 

- + - + -■■■ 

+ - + - + ■■■ 

- + - + -■■■ 

For example, 

Cn = Mu, C21 — — M21, C22 = M22 

and so forth. Thus, it is never really necessary to calculate (— 1)' +J to calculate Cy — you can simply 
compute the minor My and then adjust the sign in accordance with the checkerboard pattern. Try 
this in Example 1 . 


► EXAMPLE 2 Cofactor Expansions of a 2 x 2 Matrix 

The checkerboard pattern for a 2 x 2 matrix A = [a if is 


so that 

Cn = Mu = 0-22 C\2 = —M\2 = — «21 

C21 = — M 2 i = — a 12 C22 = M 2 2 = a 11 

We leave it for you to use Formula ( 3 ) to verify that det(A) can be expressed in terms of 
cofactors in the following four ways: 


det(A) = 


ci 11 


a l2 


a 2\ a 22 

— a\\C\\ + anCn 

— CI21C21 + CI22C22 
= a\\C\\ + CI21C21 
= CI12C12 + CI22C22 


( 6 ) 


Each of the last four equations is called a cofactor expansion of det(A). In each cofactor 
expansion the entries and cofactors all come from the same row or same column of A. 


The term minor is apparently due to the English mathematician James Sylvester (see 
p. 35), who wrote the following in a paper published in 1850: "Now conceive any one line and any one 
column be struck out, we get...a square, one term less in breadth and depth than the original square; 
and by varying in every possible selection of the line and column excluded, we obtain, supposing 
the original square to consist of n lines and n columns, n 2 such minor squares, each of which will 
represent what I term a "First Minor Determinant" relative to the principal or complete determinant." 
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For example, in the first equation the entries and cofactors all come from the first row of 
A, in the second they all come from the second row of A, in the third they all come from 
the first column of A, and in the fourth they all come from the second column of A. 

Definition of a General Formula ( 6 ) is a special case of the following general result, which we will state without 

Determinant proof. 


[EM 2.1.1 If A is an n x n matrix , then regardless of which row or column of A 
is chosen , the number obtained by multiplying the entries in that row or column by the 
corresponding cofactors and adding the resulting products is always the same. 


This result allows us to make the following definition. 


DEFINITION 2 If A is an n x n matrix, then the number obtained by multiplying the 
entries in any row or column of A by the corresponding cofactors and adding the 
resulting products is called the determinant of A, and the sums themselves are called 
cofactor expansions of A. That is, 


det(A) — aijCij a.2jC2j ~ h ■ ■ ■ T a njCnj 

| cofactor expansion along the /th column| 


and 


det(A) — (li i C.l 1 1 T ai2Ci2 T * * * T aj n Cj n 

|cofactor expansion along the ith row] 


(7) 

(B) 


► EXAMPLE 3 Cofactor Expansion Along the First Row 

Find the determinant of the matrix 


A = 



1 

-4 

4 


0 ~ 

3 

-2 


by cofactor expansion along the first row. 



Charles Lutwidge Dodgson 
(Lewis Carroll) 
(1832-1898) 


Historical Note Cofactor expansion is not 
the only method for expressing the determi- 
nant of a matrix in terms of determinants 
of lower order. For example, although it is 
not well known, the English mathematician 
Charles Dodgson, who was the author of Al- 
ice's Adventures in Wonderland and Through 
the Looking Glass under the pen name of 
Lewis Carroll, invented such a method, called 
condensation. That method has recently been 
resurrected from obscurity because of its suit- 
ability for parallel processing on computers. 

[Image: Oscar G. Rej lander/ 
Time & Life Pictures/Getty Images] 
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Solution 



3 

1 

0 


det(A) = 

-2 

-4 

3 

= 3 


5 

4 

-2 



= 3( 



4) — (1) ( — 11) + 0 = — 1 


Note that in Example 4 we had 
to compute three cofactors, 
whereas in Example 3 only two 
were needed because the third 
was multiplied by zero. As a 
rule, the best strategy for co- 
factor expansion is to expand 
along a row or column with the 
most zeros. 


► EXAMPLE 4 Cofactor Expansion Along the First Column 

Let A be the matrix in Example 3, and evaluate det(A) by cofactor expansion along the 
first column of A. 


Solution 



3 

1 

0 


det(A) = 

-2 

-4 

3 

= 3 


5 

4 

-2 





= 3( — 4) — ( — 2) ( — 2) + 5(3) = — 1 


This agrees with the result obtained in Example 3. 


0 

3 


► EXAMPLES Smart Choice of Row or Column 


If A is the 4x4 matrix 


10 0-1 
3 12 2 

10-21 
2 0 0 1 


then to find det(A) it will be easiest to use cofactor expansion along the second column, 
since it has the most zeros: 


det(A) = 1 ■ 


1 

1 

2 


0 

-2 

0 


-1 

1 

1 


For the 3 x 3 determinant, it will be easiest to use cofactor expansion along its second 
column, since it has the most zeros: 

1 -1 

det(A) = 1 • -2 • 


= - 2 ( 1 + 2 ) 
= -6 


► EXAMPLE 6 Determinant of a Lower Triangular Matrix 

The following computation shows that the determinant of a 4 x 4 lower triangular matrix 
is the product of its diagonal entries. Each part of the computation uses a cofactor 
expansion along the first row. 


an 

0 

0 

0 

0 

0 


0-22 

0 

0 

«21 

$22 

= All 

«32 

«33 

0 


0 

«31 

#32 

fl33 


fl42 

043 

$44 


«41 

#42 

fl43 

$44 






— a n a n 


U33 0 

$43 $44 


— «1 1^22033 I <*44 1 = All +22^33^44 
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The method illustrated in Example 6 can be easily adapted to prove the following 
general result. 


[EM 2.1.2 If A is an n x n triangular matrix ( upper triangular, lower trian- 
gular, or diagonal), then det(A) is the product of the entries on the main diagonal of 
the matrix', that is, det(A) = 011022 ■ • ■ o„„. 


A UsefulTechnique for 
Evaluating 2x2 and 3x3 
Determinants 


Determinants of 2 x 2 and 3x3 matrices can be evaluated very efficiently using the 
pattern suggested in Figure 2.1.1. 

► Figure 2.1.1 



WARNING The arrow tech- 
nique works only for deter- 
minants of 2 x 2 and 3x3 
matrices. It does not work 
for matrices of size 4 x 4 or 
higher. 


In the 2x2 case, the determinant can be computed by forming the product of the entries 
on the rightward arrow and subtracting the product of the entries on the leftward arrow. 
In the 3x3 case we first recopy the first and second columns as shown in the figure, 
after which we can compute the determinant by summing the products of the entries 
on the rightward arrows and subtracting the products on the leftward arrows. These 
procedures execute the computations 


Oil 

012 

021 

022 


^ 11^22 — ^ 12^21 


Oil 

Ol2 

Ol3 


022 

023 


021 

023 

+ O13 

a 2l 

022 

021 

022 

Cl23 

= on 

032 

033 

— fli2 

O31 

033 

d 31 

032 


031 

O32 

033 











— fl ll (<222<233 — O23O32) ~ ^12(^21^33 ~ Q23 fl 3l) + 013(021032 ~ O22O31) 

= 0nfi!22033 + 012023031 + 013021032 — 013022031 — 012021033 — Onfl23032 


which agrees with the cofactor expansions along the first row. 


► EXAMPLE 7 A Technique for Evaluating 2x2 and 3x3 Determinants 



= [45 + 84 + 96] - [105 - 48 - 72] = 240 ◄ 
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Exercise Set 2.1 


In Exercises 1-2, find all the minors and cofactors of the ma- 
trix A . 


17. A = 


A - 1 


18. A = 


A-4 

-1 



1 -2 3 


1 1 2 

L J 

0 0 A - 5 

1. A = 

6 7-1 

2. A = 

3 3 6 




-3 1 4 


0 1 4 

19. Evaluate the determinant in Exercise 13 by a cofactor expan 


3. Let 


A = 


4-1 16 

0 0-33 

4 1 0 14 
4 13 2 


Find 

(a) M 13 and Ci 3 . 

(c) M 2 2 and C 22 . 

4. Let 


A = 


Find 

(a) M 32 and C 32 . 
(c) M 4 i and C41 . 


(b) M 23 and C 23 . 
(d) M 21 and C 21 . 


2 3-1 1 
-3 2 0 3 
3-210 
3-214 


(b) M44 and C44. 
(d) M 24 and C 24 . 


In Exercises 8, evaluate the determinant of the given matrix. 
If the matrix is invertible, use Equation (2) to find its inverse. 


5. 


3 

-2 


6 . 


7. 


-5 7' 

-7 -2 


8 . 


•> Jl V6 

4 V3 


In Exercises 1, use the arrow technique to evaluate the de- 
terminant. 


9. 


11 . 


13. 


a- 3 5 

—3 a — 2 

-2 1 4 

3 5-7 

1 6 2 

3 0 0 

2 -1 5 

1 9 -4 


10 . 


12 . 


14. 


-2 

5 

3 

-1 

3 

1 

c 

2 

4 


7 6 
1 -2 

8 4 

1 2 

0 -5 

7 2 

-4 

1 

c — 1 


In Exercises , find all values of X for which det(A) = 0. 


15. A = 


X — 2 
-5 


A + 4 


16. A = 


X — 4 0 0 

0 X 2 

0 3 A - 1 


sion along 
(a) the first row. 

(c) the second row. 
(e) the third row. 


(b) the first column. 

(d) the second column, 
(f) the third column. 


20. Evaluate the determinant in Exercise 12 by a cofactor expan- 
sion along 

(a) the first row. (b) the first column. 

(c) the second row. (d) the second column. 

(e) the third row. (f) the third column. 

In Exercises 21-26, evaluate det(A) by a cofactor expansion 

along a row or column of your choice. 


21. A = 


23. A = 


25. A = 


-3 


0 

1 


'3 

3 

l" 


2 


5 

1 

II 

rj 

1 

0 

-4 


-1 


0 

5 


1 

-3 

5 


1 

k 

k 2 ' 



~k + 

1 

k — 1 

7 

1 

k 

k 2 


II 

2 


k - 3 

4 

1 

k 

k 2 



5 


k + 1 

k 


26. A = 


3 0 5 

2 0-2 

1 -3 0 
10 3 2 

0 0 1 

3 3 -1 

2 4 2 

4 6 2 

2 4 2 


In Exercises 27-32, evaluate the determinant of the given ma- 
trix by inspection. 


2 0 0 
0 2 0 
0 0 2 


1111 ' 
0 2 2 2 
0 0 3 3 

0 0 0 4 



'l 

0 

0 " 


27. 

0 

-1 

0 

28. 


0 

0 

1 

- 


'0 

0 0 

O' 

r 


1 

2 0 

0 


29. 

0 

4 3 

0 

30. 


1 

2 3 

8 
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41. Prove that the equation of the line through the distinct points 
(fli, b\) and (a 2 , b 2 ) can be written as 



'1 

2 

7 

-3' 


'-3 

0 

0 

O' 


0 

1 

-4 

1 


1 

2 

0 

0 

31. 

0 

0 

2 

7 

32. 

40 

10 

-1 

0 


0 

0 

0 

3 


100 

200 

-23 

3 


33. In each part, show that the value of the determinant is inde- 


pendent of 9 . 


(a) 

sin 9 cos 9 

— cos 9 sin 9 



sin 9 

cos 9 

(b) 

— cos 9 

sin 9 


sin 9 — cos 9 

sin 9 + cos 9 


34. Show that the matrices 



a b 


~d 

e 

A = 

0 c 

and B = 

0 

f 


x y 

ci\ b\ 
0 2 bi 


= 0 


42. Prove that if A is upper triangular and By is the matrix that 
results when the i th row and y'th column of A are deleted, then 
By is upper triangular if i < j . 


True-False Exercises 


TF. In parts (a)-(j) determine whether the statement is true or 
false, and justify your answer. 


(a) The determinant of the 2x2 matrix 


b 

d 


is ad + be. 


(b) Two square matrices that have the same determinant must have 
the same size. 


commute if and only if 



35. By inspection, what is the relationship between the following 
determinants? 


(c) The minor My is the same as the cofactor Cy if i + j is even. 

(d) If A is a 3 x 3 symmetric matrix, then Cy = C ji for all i and j . 

(e) The number obtained by a cofactor expansion of a matrix A is 
independent of the row or column chosen for the expansion. 

(f) If A is a square matrix whose minors are all zero, then 
det(A) = 0. 



a 

b 

C 


a -f- X 

b 

C 

di = 

d 

1 

f 

and d 2 = 

d 

1 

f 


g 

0 

l 


g 

0 

l 


36. Show that 

tr ( A) 1 

tr(A 2 ) tr(A) 

for every 2x2 matrix A . 


det(A) = ^ 


(g) The determinant of a lower triangular matrix is the sum of the 
entries along the main diagonal. 

(h) For every square matrix A and every scalar c, it is true that 
det(cA) = cdet(A). 

(i) For all square matrices A and B, it is true that 

det(A + B) = det(A) + det(B) 

(j) For every 2x2 matrix A it is true that det(A 2 ) = (det(A)) 2 . 


37. What can you say about an nth-order determinant all of whose 
entries are 1? Explain. 

38. What is the maximum number of zeros that a 3 x 3 matrix can 
have without having a zero determinant? Explain. 

39. Explain why the determinant of a matrix with integer entries 
must be an integer. 


Working withTechnology 

Tl. (a) Use the determinant capability of your technology utility 
to find the determinant of the matrix 


- 4.2 

-1.3 

1.1 

6.0- 

0.0 

0.0 

-3.2 

3.4 

4.5 

1.3 

0.0 

14.8 

.4-7 

1.0 

3.4 

2.3_ 


Working with Proofs 


40. Prove that (jq , vi), (x 2 , y 2 ), and (x 3 , y 3 ) are collinear points 
if and only if 


Xl 

yi 

1 

x 2 

yi 

1 

X} 

yi 

1 


0 


(b) Compare the result obtained in part (a) to that obtained by a 
cofactor expansion along the second row of A. 

T2. Let A" be the n x n matrix with 2’s along the main diagonal, 
1 ’s along the diagonal lines immediately above and below the main 
diagonal, and zeros everywhere else. Make a conjecture about the 
relationship between n and det(A„). 
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2.2 Evaluating Determinants by Row Reduction 

In this section we will show how to evaluate a determinant by reducing the associated 
matrix to row echelon form. In general, this method requires less computation than 
cofactor expansion and hence is the method of choice for large matrices. 

A BasicTheorem We begin with a fundamental theorem that will lead us to an efficient procedure for 
evaluating the determinant of a square matrix of any size. 


!EM 2.2.' Let A be a square matrix. If A has a row of zeros or a column of 
zeros, then det(A) = 0. 


Proof Since the determinant of A can be found by a cofactor expansion along any row 
or column, we can use the row or column of zeros. Thus, if we let C \ , C 2 , . . . , C n denote 
the cofactors of A along that row or column, then it follows from Formula (7) or (8) in 
Section 2.1 that 

det(A) = 0 • C\ T 0 • C 2 T • • • -{- 0 • C n — 0 


Because transposing a matrix 
changes its columns to rows 
and its rows to columns, al- 
most every theorem about the 
rows of a determinant has 
a companion version about 
columns, and vice versa. 


The following useful theorem relates the determinant of a matrix and the determinant 
of its transpose. 


!EM 2.2.2 Let Abe a square matrix. Then det(A) = det(A r ). 


Proof Since transposing a matrix changes its columns to rows and its rows to columns, 
the cofactor expansion of A along any row is the same as the cofactor expansion of A T 
along the corresponding column. Thus, both have the same determinant. 


Elementary Row 
Operations 


The next theorem shows how an elementary row operation on a square matrix affects the 
value of its determinant. In place of a formal proof we have provided a table to illustrate 
the ideas in the 3 x 3 case (see Table 1). 


The first panel of Table 1 
shows that you can bring a 
common factor from any row 
(column) of a determinant 
through the determinant sign. 
This is a slightly different way 
of thinking about part (a) of 
Theorem 2.2.3. 


Table 1 


Relationship 

Operation 


kct\\ ka 12 kci \2 

021 O22 O23 

031 O32 O33 

det(fi) 

II 

II 

Oil O12 Ol3 

021 O22 O23 

031 O32 O33 

t(A) 


In the matrix B the first 
row of A was multiplied 
by k. 


021 O22 fl23 

Oil O12 Ol3 

031 032 033 

det(fi 

= - 

Oil O12 013 

021 O22 O23 

031 O32 033 

iet(A) 


In the matrix B the first and 

second rows of A were 
interchanged. 


a 11 -f- kciii ci\2 T kci22 013 T kci2i 

CI 21 022 O23 

031 032 033 

det(B) = det(A) 

= 

011 012 013 

fl 21 O22 O23 

031 032 a 33 


In the matrix B a multiple of 
the second row of A was 
added to the first row. 
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Elementary Matrices 


Observe that the determinant 
of an elementary matrix can- 
not be zero. 


Matrices with Proportional 
Rows or Columns 


THEOREM 2.2.3 Let A be an n x n matrix. 

(a) If B is the matrix that results when a single row or single column of A is multiplied 
by a scalar k, then det(B) = k det(A). 

(b) If B is the matrix that results when two rows or two columns of A are interchanged, 
then det(B) = — det(A). 

(c) If B is the matrix that results when a multiple of one row of A is added to another 
or when a multiple of one column is added to another, then det(Z?) = det(A). 


We will verify the first equation in Table 1 and leave the other two for you. To start, 
note that the determinants on the two sides of the equation differ only in the first row, so 
these determinants have the same cofactors, C\\, C u, C 13 , along that row (since those 
cofactors depend only on the entries in the second two rows). Thus, expanding the left 
side by cofactors along the first row yields 


ka 11 

ka\2 

£<313 

<321 

<322 

<323 

<331 

<332 

<333 


= ka\\C\\ -f- ka\iC \2 T ka\i,C \2 
= k(a\\C\\ + anCn + anCu) 



<3ll 

<312 

<3 1 3 

k 

<321 

<322 

<323 


<331 

<332 

<333 


It will be useful to consider the special case of Theorem 2.2.3 in which A = I„ is the 
n x n identity matrix and E (rather than B) denotes the elementary matrix that results 
when the row operation is performed on In this special case Theorem 2.2.3 implies 
the following result. 


Let E be an n x n elementary matrix. 

(a) If E results from multiplying a row of I n by a nonzero number k, then detlZs) = k. 

(b) If E results from interchanging two rows of /„, then det(£) = — 1. 

(c) If E results from adding a multiple of one row of I n to another, then det(Z?) = 1. 


► EXAMPLE 1 Determinants of Elementary Matrices 


The following determinants of elementary matrices, which are evaluated by inspection, 
illustrate Theorem 2.2.4. 


1 

0 

0 

0 


0 

0 

0 

0 

3 

0 

0 

= 3, 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 


1 

0 

0 


1 


1 

0 

0 

7 

0 

= - 1 , 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 


0 

0 

0 

1 


The second row of I4 The first and last rows of 7 times the last row of I4 

was multiplied by 3. I4 were interchanged. was added to the first row. 


If a square matrix A has two proportional rows, then a row of zeros can be introduced 
by adding a suitable multiple of one of the rows to the other. Similarly for columns. But 
adding a multiple of one row or column to another does not change the determinant, so 
from Theorem 2.2.1, we must have det(A) = 0. This proves the following theorem. 
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Evaluating Determinants 
by Row Reduction 


Even with today’s fastest com- 
puters it would take millions of 
years to calculate a 25 x 25 de- 
terminant by cofactor expan- 
sion, so methods based on row 
reduction are often used for 
large determinants. For deter- 
minants of small size (such as 
those in this text), cofactor ex- 
pansion is often a reasonable 
choice. 


REM 2.2.5 If A is a square matrix with two proportional rows or two proportional 
columns , then det(A) = 0. 


► EXAMPLE 2 Proportional Rows or Columns 


Each of the following matrices has two proportional rows or columns; thus, each has a 
determinant of zero. 


'-1 4" 
-2 8 


1 -2 7 

-4 8 5 

2 -4 3 




-5' 

2 

4 

15 


◄ 


We will now give a method for evaluating determinants that involves substantially less 
computation than cofactor expansion. The idea of the method is to reduce the given 
matrix to upper triangular form by elementary row operations, then compute the de- 
terminant of the upper triangular matrix (an easy computation), and then relate that 
determinant to that of the original matrix. Here is an example. 


► EXAMPLE 3 Using Row Reduction to Evaluate a Determinant 


Evaluate det(A) where 



5 

9 

1 


Solution We will reduce A to row echelon form (which is upper triangular) and then 
apply Theorem 2.1.2. 



0 

1 

5 


3 

-6 

9 

det(A) = 

3 

-6 

9 

= — 

0 

1 

5 


2 

6 

1 


2 

6 

1 





1 

-2 

3 




= -3 

0 

1 

5 





2 

6 

1 


The first and second rows of 
A were interchanged. 


A common factor of 3 from 
the first row was taken 
through the determinant sign. 


= -3 


1 

0 

0 


-2 3 

1 5 

10 -5 


= -3 


1 

0 

0 


-2 3 

1 5 

0 -55 


—2 times the first row was 
added to the third row. 


— 10 times the second row 
was added to the third row. 


= ( — 3) ( — 55) 


1 

0 

0 


-2 

1 

0 


3 

5 

1 


A common factor of —55 
from the last row was taken 
through the determinant sign. 


= (— 3)(— 55)(1) = 165 
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► EXAMPLE 4 Using Column Operations to Evaluate a Determinant 

Compute the determinant of 

"1 0 0 3" 

2 7 0 6 

A — 

0 6 3 0 

7 3 1-5 


Example 4 points out that it 
is always wise to keep an eye 
open for column operations 
that can shorten computa- 
tions. 


Solution This determinant could be computed as above by using elementary row oper- 
ations to reduce A to row echelon form, but we can put A in lower triangular form in 
one step by adding —3 times the first column to the fourth to obtain 


det(A) = det 


1 

2 

0 

7 


0 

7 

6 

3 


0 0 

0 0 

3 0 

1 -26 


= (1)(7)(3)(— 26) = —546 ◄ 


Cofactor expansion and row or column operations can sometimes be used in com- 
bination to provide an effective method for evaluating determinants. The following 
example illustrates this idea. 


► EXAMPLE 5 Row Operations and Cofactor Expansion 

Evaluate det (A) where 

"3 5 -2 6" 

12-11 
A ~ 2 4 1 5 

3 7 5 3 


Solution By adding suitable multiples of the second row to the remaining rows, we 
obtain 

0-113 
1 1 

3 3 

8 0 


det(A) 


I 1 3 

0 3 3 

1 8 0 


Cofactor expansion along 
the first column 


= -(-D 


3 

3 

3 

3 

3 

-18 


We added the first row to the 
third row. 


Cofactor expansion along 
the first column 
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Exercise Set 2.2 


1. a = 


3. A = 


-2 3' 

1 4 


2-1 3 

1 2 4 
5 -3 6 


2. A = 


4. A = 


= 

det(A r ). 


d 

e 

f 


8 

h 

i 


'-6 f 

15. 

8 

h 

i 

16. 

d 

e 

f 


2 — 2_ 


a 

b 

c 


a 

b 

c 


4 2-1 

0 2-3 

-1 1 5 


17. 


3o 


3 b 

—e 

3 c 

-f 

18. 

£7 -f- d 

-d 

b + e 

—e 

c + f 
-f 

4h 

4 7 


8 

h 

i 


In Exercises -8, find the determinant of the given elementary 
matrix by inspection. 


5. 


0 0 
1 0 
0 -5 
0 0 


6 . 


1 0 0 
0 1 0 
-5 0 1 



a + g 

b + h 

c + i 



a 


b 

c 

19. 

d 

e 

f 


20. 

2d 


2e 

-f 


8 

h 

i 



8 + 3a 

h + 3b 

i + 3c 


—3a 

-3b 

— 

3c 



a 

b 

c 


21. 

d 

e 

f 

22. 

d 

e 

f 



g-4d 

h — 4e 

i — 

4/ 



2a 

2b 

2c 



'1 

0 

0 

O' 


'1 

0 

0 

o' 

0 

0 

1 

0 


0 

_ 1 

0 

0 





8. 


3 



0 

1 

0 

0 


0 

0 

1 

0 

0 

0 

0 

1 


0 

0 

0 

1 


7. 


In Exercises " 14, evaluate the determinant of the matrix 
by first reducing the matrix to row echelon form and then using 
some combination of row operations and cofactor expansion. 


23. Use row reduction to show that 
1 1 1 


b 

b 2 


= (b — a)(c — o)(c — b) 







"o 

0 £713 

9. 

'3-6 9 

-2 7 -2 

10. 

'3 6-9' 

0 0-2 

(a) det 

0 

«22 023 


0 1 5 


-2 1 5 


£731 

032 O33 


11 . 


2 1 

1 0 
0 2 1 
0 1 2 


3 1 
1 1 
0 
3 



1 

-3 

o' 

12 . 

-2 

4 

1 


5 

-2 

2 


24. Verify the formulas in parts (a) and (b) and then make a con- 
jecture about a general result of which these results are special 
cases. 


— — O13O22O3I 


a 14 

024 

034 

£744 


(b) det 


'0 

0 

0 

£741 


0 

0 

032 

£742 


0 

023 

#33 

£743 


— fl 14d23tf32 a 41 


In Exercises 25-28, confirm the identities without evaluating 



1 

3 

1 

5 

3' 


the determinants directly. 









-2 

-7 

0 

-4 

2 



Oi b\ 

a \ b\ + C\ 


d\ 

b\ Ci 





13. 

0 

0 

1 

0 

1 


25. 

a 2 b 2 

a 2 + ^2 + c 2 

= 

Cl 2 

b 2 c 2 






0 

0 

2 

1 

1 



03 b 2 

+ C3 


@3 

h c 3 






0 

0 

0 

1 

1 




















a 1 4~ b\t 

Cl 2 + ^3 “I - ^ 3 ^ 



Oi 

0-2 

03 


1 

-2 

3 

f 



26. 

a\t b\ 

“1“ ^2 ^ 3 ^ “I - ^3 

= 0- 

t 2 ) 

bi 

b 2 

bi 


5 

-9 

6 

3 




Cl 

C 2 

C 3 



Cl 

C 2 

Cl 

14. 

-1 

2 - 

-6 

-2 




fli + b\ 

Oi — bi C] 



0 1 b\ 

Cl 





2 

8 

6 

1 



27. 

a 2 + b 2 

02 — b 2 c 2 

= 

-2 

a 2 b 2 

C 2 




In Exercises 

-22, evaluate the determinant, given that 


o 3 + b 2 

03 — 63 C3 



03 b 2 

Cl 





a 

b 

c 



Cl\ 

b 1 + ta\ 

Ci + rb\ + sa 1 


oi 

02 

Cl3 

d 

e 

f 

= -6 

28. 

a 2 

b 2 + ta 2 

Ci + rbi + sa 2 

= 

b\ 

bi 

b 2 

8 

h 

i 



Cl 3 

b 2 ta 2 

Ci + rbi + sa 3 


Cl 

Cl 

Cl 
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In Exercises 29-3!#, show that det(A) = 0 without directly eval- 
uating the determinant. 


29. A = 



8 

2 

10 

-6 


1 4“ 

5 1 

6 5 

4 -3. 


a b b b 
b a b b 
b b a b 
b b b a 

True-False Exercises 


30. A = 


1 

-4 

1 

1 

1 


1 

1 

-4 

1 

1 


1 

1 

1 

-4 


1 -4 


It can be proved that if a square matrix M is partitioned into 
block triangular form as 


M = 


A 0 
C B 


or M = 


A C 
0 B 


in which A and B are square, then det(M) = det(A) det(fi). Use 
this result to compute the determinants of the matrices in Exer- 
cises 31 and 32. 


31. M = 


1 

2 

0 

8 

6 

-9 

2 

5 

0 

4 

7 

5 

-1 

3 

2 

6 

9 

-2 

0 

0 

0 

3 

0 

0 

0 

0 

0 

2 

1 

0 

0 

0 

0 

-3 

8 

-4 


32. M = 


1 

2 

0 

0 

o' 

0 

1 

2 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

2 

2 

0 

0 

0 

1 


33. Let A be an n x n matrix, and let B be the matrix that re- 
sults when the rows of A are written in reverse order. State a 
theorem that describes how det(A) and det(B) are related. 


34. Find the determinant of the following matrix. 


TF. In parts (a) — (f ) determine whether the statement is true or 

false, and justify your answer. 

(a) If A is a 4 x 4 matrix and B is obtained from A by interchang- 
ing the first two rows and then interchanging the last two rows, 
then det(B) = det(A). 

(b) If A is a 3 x 3 matrix and B is obtained from A by multiplying 
the first column by 4 and multiplying the third column by | , 
then det(B) = 3 det(A). 

(c) If A is a 3 x 3 matrix and B is obtained from A by adding 5 
times the first row to each of the second and third rows, then 
det(fi) = 25det(A). 

(d) If A is an ii x n matrix and B is obtained from A by multiply- 
ing each row of A by its row number, then 

n(n + 1 ) 

det(B) = 2 — - det(A) 

(e) If A is a square matrix with two identical columns, then 
det(A) = 0. 

(f ) If the sum of the second and fourth row vectors of a 6 x 6 
matrix A is equal to the last row vector, then det(A) = 0. 

Working with Technology 

Tl. Find the determinant of 


‘4.2 

-1.3 

1.1 

6.0- 

0.0 

0.0 

-3.2 

3.4 

4.5 

1.3 

0.0 

14.8 

.4.7 

1.0 

3.4 

2.3_ 


by reducing the matrix to reduced row echelon form, and compare 
the result obtained in this way to that obtained in Exercise Tl of 
Section 2. 1 . 


2.3 Properties of Determinants; Cramer's Rule 

In this section we will develop some fundamental properties of matrices, and we will use 
these results to derive a formula for the inverse of an invertible matrix and formulas for the 
solutions of certain kinds of linear systems. 


Basic Properties of 
Determinants 


Suppose that A and B are n x n matrices and k is any scalar. We begin by considering 
possible relationships among det(A), det(fi), and 


det (kA), det(A + B), and det(AB) 


Since a common factor of any row of a matrix can be moved through the determinant 
sign, and since each of the n rows in kA has a common factor of k, it follows that 
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det(fcA) = k n det (A) 


( 1 ) 


For example, 


ka\\ 

kd\2 

ka\ 2 


Oil 

012 

013 

kci 21 

kd22 

ka 22 

= k 3 

021 

022 

023 

kd^\ 

kd22 

ka 22 


031 

032 

O33 


Unfortunately, no simple relationship exists among det(A), det(5), and det(A + B). 
In particular, det(A + B ) will usually not be equal to det(A) + det(fi). The following 
example illustrates this fact. 


► EXAMPLE 1 det(A + B) # det(A) + det(B) 

Consider 



'1 

2' 


'3 f 

A = 

2 

5 

, B = 

1 3 


We have det(A) = 1, det(B) = 8, and det(A + B) = 23; thus 
det(A + B) ^ det(A) + det(fi) 


In spite of the previous example, there is a useful relationship concerning sums of 
determinants that is applicable when the matrices involved are the same except for one 
row (column). For example, consider the following two matrices that differ only in the 
second row: 


A = 


and B = 


Oil 

b 2 \ 


o 12 
b 22 


a ii a 12 
o 21 « 22 . 

Calculating the determinants of A and B , we obtain 

det(A) + det(5) = (011022 — a n a 2 i) + ( a \\b 22 — a \ 2 b 2 \) 
= an(a 22 + b 22 ) — 012(021 + & 21 ) 

011 012 


= det 


021 + b 2 \ 022 + b 22 


Thus 


det 


011 

021 


012 

022 


det 


Oil O12 

b 2 i b 22 


— det 


Oil 

021 + ^21 


012 

022 + b 22 


This is a special case of the following general result. 


Let A, B, and C be n x n matrices that differ only in a single row, 
say the rth, and assume that the rth row of C can be obtained by adding corresponding 
entries in the rth rows of A and B. Then 

det(C) = det(A) + det(5) 

The same result holds for columns. 


► EXAMPLE 2 Sums of Determinants 

We leave it to you to confirm the following equality by evaluating the determinants. 



1 

7 

5 


"l 

7 

5" 


"l 

7 

5" 

det 

2 

0 

3 

= det 

2 

0 

3 

+ det 

2 

0 

3 


1 + 0 

4+1 

7+ (-1) 


1 

4 

7 


0 

1 

-1 
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Determinant of a Matrix Considering the complexity of the formulas for determinants and matrix multiplication, 
Product it would seem unlikely that a simple relationship should exist between them. This is what 
makes the simplicity of our next result so surprising. We will show that if A and B are 
square matrices of the same size, then 

det(AB) = det(A)det(B) (2) 

The proof of this theorem is fairly intricate, so we will have to develop some preliminary 
results first. We begin with the special case of (2) in which A is an elementary matrix. 
Because this special case is only a prelude to (2), we call it a lemma. 


.EMMA 2.3.2 If B is ann x n matrix and E is an n x n elementary matrix , then 

det (EB) = det(£)det(fl) 


Proof We will consider three cases, each in accordance with the row operation that 
produces the matrix E. 

Case 1 If £ results from multiplying a row of /„ by k, then by Theorem 1.5.1, EB results 
from B by multiplying the corresponding row by k\ so from Theorem 2.2.3(a) we have 

det (EB) — kdet(B) 

But from Theorem 2.2.4(a) we have det(£) = k, so 

det (EB) = det(£) det(B) 

Cases 2 and 3 The proofs of the cases where E results from interchanging two rows of 
/„ or from adding a multiple of one row to another follow the same pattern as Case 1 
and are left as exercises. 


Remark It follows by repeated applications of Lemma 2.3.2 that if B is an n x n matrix and 
Ei, E 2 , . . . , E r are n x n elementary matrices, then 

det(E, E 2 ■ ■ ■ E r B) = det(£0 det(£ 2 ) ■ • • det(£ r ) det(S) (3) 


DeterminantTest for 
Invertibility 


Our next theorem provides an important criterion for determining whether a matrix is 
invertible. It also takes us a step closer to establishing Formula (2). 


A square matrix A is invertible if and only if det (A) ^ 0. 


Proof Let R be the reduced row echelon form of A. As a preliminary step, we will 
show that det(A) and det(E) are both zero or both nonzero: Let E\, E 2 , be the 

elementary matrices that correspond to the elementary row operations that produce R 
from A. Thus 

R = E r - E 2 E\A 


and from (3), 

det (R) = det(£,.) ■ • • det(£V) det(£j) det(A) (4) 

We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant 
of an elementary matrix is nonzero. Thus, it follows from Formula (4) that det (A) and 
det(E) are either both zero or both nonzero, which sets the stage for the main part of 
the proof. If we assume first that A is invertible, then it follows from Theorem 1 .6.4 that 
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It follows from Theorems 2.3.3 
and 2.2.5 that a square matrix 
with two proportional rows or 
two proportional columns is 
not invertible. 



Augustin Louis Cauchy 
(1789-1857) 


Historical Note In 1815 the great 
French mathematician Augustin 
Cauchy published a landmark pa- 
per in which he gave the first sys- 
tematic and modern treatment of 
determinants. It was in that pa- 
per that Theorem 2.3.4 was stated 
and proved in full generality for 
the first time. Special cases of 
the theorem had been stated and 
proved earlier, but it was Cauchy 
who made the final jump. 

[Image: © Bettmarm/CORBIS] 


R — I and hence that det (R) = 1 (f 0). This, in turn, implies that det(A) f 0, which 
is what we wanted to show. 

Conversely, assume that det(A) f 0. It follows from this that det(7?) f 0, which 
tells us that R cannot have a row of zeros. Thus, it follows from Theorem 1.4.3 that 
R — I and hence that A is invertible by Theorem 1.6.4. 


EXAMPLE 3 Determinant Test for Invertibility 

Since the first and third rows of 


A = 


1 2 
1 0 
2 4 


3 

1 

6 


are proportional, det(A) = 0. Thus A is not invertible. 


We are now ready for the main result concerning products of matrices. 

THEOREM 2.3.4 If A and B are square matrices of the same size, then 

det(Afi) = det(A) det(fi) 

Proof We divide the proof into two cases that depend on whether or not A is invertible. 
If the matrix A is not invertible, then by Theorem 1.6.5 neither is the product AB. 
Thus, from Theorem 2.3.3, we have det(AB) = 0 and det(A) = 0, so it follows that 
det(Afi) = det(A) det(fi). 

Now assume that A is invertible. By Theorem 1.6.4, the matrix A is expressible as a 
product of elementary matrices, say 

A = E\E 2 ---E r (5) 

so 

AB — E i E 2 • • • E r B 
Applying (3) to this equation yields 

det(A5) = detC^i) de\(E 2 ) ■ • • det(£V) det(5) 
and applying (3) again yields 

det(AB) = det {E X E 2 ■■■ E r ) det (B) 
which, from (5), can be written as det(A5) = det(A) det(fi). 

► EXAMPLE 4 Verifying that det(AB) = det(A) det(6) 

Consider the matrices 



a r 


'-1 3' 


'2 

17' 

A = 

2 1 

, B = 

5 8 

, AB = 

3 

14 


We leave it for you to verify that 

det(A) = 1, det(5) = -23, and det(A5) = -23 
Thus det(AB) = det(A) det(5), as guaranteed by Theorem 2.3.4. 

The following theorem gives a useful relationship between the determinant of an 
invertible matrix and the determinant of its inverse. 
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Adjoint of a Matrix 



Leonard Eugene 

Dickson 

(1874-1954) 


Historical Note The use of the 

term adjoint for the transpose 
of the matrix of cofactors ap- 
pears to have been introduced by 
the American mathematician L. E. 
Dickson in a research paperthat he 
published in 1902. 

[Image: Courtesy of the American 
Mathematical Society 
www.ams.org] 


!EM 2.3.5 If A is invertible, then 

det (A- 1 ) 


1 

det(A) 


Proof Since A~ l A — I, it follows that det(A _1 A) = det(7). Therefore, we must have 
det(A”') det(A) = 1. Sincedet(A) / 0, the proof can be completed by dividing through 
by det (A). 

In a cofactor expansion we compute det (A) by multiplying the entries in a row or column 
by their cofactors and adding the resulting products. It turns out that if one multiplies 
the entries in any row by the corresponding cofactors from a different row, the sum of 
these products is always zero. (This result also holds for columns.) Although we omit 
the general proof, the next example illustrates this fact. 


► EXAMPLES Entries and Cofactors from Different Rows 



We leave it for you to verify that the cofactors of A are 


Cn = 

= 12 

C i2 = 6 

Cl 3 = 

-16 

C -21 = 

= 4 

C 22 = 2 

c 23 = 

16 

C 3 I = 

= 12 

c 32 = -10 

c 33 = 

16 


so, for example, the cofactor expansion of det(A) along the first row is 


det (A) = 3 C n + 2 C 12 + (-l)Ci 3 = 36 + 12 + 16 = 64 


and along the first column is 

det(A) = 3 Cn + C 2 1 + 2C 3 i = 36 + 4 + 24 = 64 


Suppose, however, we multiply the entries in the first row by the corresponding cofactors 
from the second row and add the resulting products. The result is 

3C 2 i + 2 C 22 + (-1)C 23 = 12 + 4 - 16 = 0 


Or suppose we multiply the entries in the first column by the corresponding cofactors 
from the second column and add the resulting products. The result is again zero since 


3C 12 + 1C 22 + 2C 32 = 18 + 2- 20 = 0 ◄ 


DEFINITION 1 If A is any n x n matrix and C, ; is the cofactor of a l; - , then the matrix 


'C n 

Cn 

c, 

C21 

C 22 

• C; 

Cnl 

C „2 ' ' 

c„ 


is called the matrix of cofactors from A. The transpose of this matrix is called the 
adjoint of A and is denoted by adj(A). 
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► EX AM PLE 6 Adjoint of a 3 x 3 Matrix 



Let 


'3 2 

-r 


A 


1 6 

3 




.2 -4 

0_ 


As noted in Example 5, the cofactors of A are 



Cn = 12 

C 

12 = 6 


Cj 3 

C 21 =4 

C 22 = 2 


C23 

C31 = 12 

C 32 = -10 


C33 

so the matrix of cofactors is 

'12 

6 - 

-16" 



4 

2 

16 



.12 

-10 

16. 


and the adjoint of A is 


" 12 

4 

12 

adj(A) = 

6 

2 

-10 


-16 16 16 


In Theorem 1.4.5 we gave a formula for the inverse of a 2 x 2 invertible matrix. Our 
next theorem extends that result to n x n invertible matrices. 


It follows from Theorems 2.3.5 
and 2. 1 .2 that if A is an invert- 
ible triangular matrix, then 

■ 1 1 1 

det(A^) = 

All 022 Ann 

Moreover, by using the adjoint 
formula it is possible to show 
that 

1 1 1 

flu fl 22 a nn 

are actually the successive di- 
agonal entries of A -1 (com- 
pare A and A~* in Example 3 
of Section 1.7). 


Inverse of a Matrix Using Its Adjoint 

If A is an invertible matrix, then 


A " 1 


1 

det(A) 


adj(A) 


(6) 


Proof We show first that 
Consider the product 


o 11 «12 
a 2\ a 22 


Aadj(A) = 



dfi 1 d n 2 


A adj(A) = det(A)/ 


&ln 
@2 n 


Cljn 


&nn 


C it C21 ... Cj 1 ... C„i 

C 12 C 22 ■■■ Cj 2 ... C „2 

C\n C 2 n • - ■ Cy n . . . C nn 


The entry in the ith row and yth column of the product A adj(A) is 


CtilCj\ “t“ ai2Cj2 + • • • + tlinC jn 


(7) 


(see the shaded lines above). 

If i = j, then (7) is the cofactor expansion of det(A) along the / th row of A (Theo- 
rem 2.1.1), and if i ^ j, then the a ’ s and the cofactors come from different rows of A, 
so the value of (7) is zero (as illustrated in Example 5). Therefore, 
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Cramer's Rule 



Gabriel Cramer 
(1704-1752) 


Historical Note Variations of 
Cramer's rule were fairly well 
known before the Swiss mathe- 
matician discussed it in work he 
published in 1750. It was Cramer's 
superior notation that popularized 
the method and led mathemati- 
cians to attach his name to it. 

[Image: Science Source/Photo 
Researchers] 


A adj(A) = 


del (A) 
0 


0 

det(A) 


0 

0 

det(A) 


= det(A)/ 


Since A is invertible, det(A) / 0. Therefore, Equation (8) can be rewritten as 


— [Aadj(A)] = / or A 
det(A) 

Multiplying both sides on the left by A -1 yields 

,-i 1 


1 


det(A) 


adj(A) 


= / 


A -1 = 


det(A) 


adj(A) 


( 8 ) 


► EXAMPLE 7 Using the Adjoint to Find an Inverse Matrix 

Use Formula (6) to find the inverse of the matrix A in Example 6. 

Solution We showed in Example 5 that det(A) = 64. Thus, 


, 1 1 

A" 1 = adj(A) = — 

det(A) JV 64 






12 

4 

12 

12 

4 

12~ 


64 

64 

64 

6 

2 

— 10 


_ 6 _ 

_ 2 _ 

_ 10 




64 

64 

64 

-16 

16 

16 


16 

16 

16 





64 

64 

64 


Our next theorem uses the formula for the inverse of an invertible matrix to produce a 
formula, called Cramer’s rule, for the solution of a linear system Ax = b of n equations 
in n unknowns in the case where the coefficient matrix A is invertible (or, equivalently, 
when det(A) / 0). 


Cramer's Rule 


If Ax = b is a system of n linear equations in n unknowns such that det(A) 0, then 
the system has a unique solution. This solution is 

det(Ai) det(AO det(A„) 

X\ = , X2 = , . . . , x„ = 

det(A) det(A) det(A) 

where Aj is the matrix obtained by replacing the entries in the jth column of A by the 

entries in the matrix 

b\ 


b 


n 


Proof If det( A) 0, then A is invertible, and by Theorem 1 .6.2, x = A 1 b is the unique 

solution of Ax = b. Therefore, by Theorem 2.3.6 we have 




\C n 

^21 

■ c 

A -1 b = — ' — adj(A)b = 

1 

C\2 

C 22 

■ C 

det(A) 

det(A) 

Cm 

C 211 

■ C, 
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For n > 3, it is usually more 
efficient to solve a linear sys- 
tem with n equations in n 
unknowns by Gauss-Jordan 
elimination than by Cramer’s 
rule. Its main use is for obtain- 
ing properties of solutions of a 
linear system without actually 
solving the system. 


Equivalence Theorem 


Multiplying the matrices out gives 


1 


det(A) 

b\C\ n + ^2 Cln + - 

The entry in the yth row of x is therefore 


biC n + biCi\ + ■ ■ ■ + b n C n \ 
b\C\2 + bnC22 + ■ • • + b n C n 2 


bn f ' H 


Now let 




b\C\j 

+ b2C2j 


b n C n j 



X j 


det(A) 




an 

fll2 

°ij- 1 

b\ 

aij+i ■ ■ 

Q\n 

*J = 

«21 

«22 

a 2j-l 

^2 

a 2 j+\ ■ ■ 

@2 n 


_&n\ 

^ n2 

Cl n j — \ 

b n 

Mfij+l 

Clnn 


( 9 ) 


Since Aj differs from A only in the yth column, it follows that the cofactors of entries 
b] ,b 2 ,..., b n in Aj are the same as the cofactors of the corresponding entries in the j th 
column of A. The cofactor expansion of det(A ; ) along the yth column is therefore 

det(Ay) = b\C\j + b 2 C 2 j + ■ ■ • + b„C n j 

Substituting this result in (9) gives 

det(A 7 ) 

Xj ~ det(A) 


► EXAMPLE 8 Using Cramer's Rule to Solve a Linear System 

Use Cramer’s rule to solve 


x\ + + 2 x 3 = 6 

— 3xi + 4x2 + 6 x 3 = 30 
— xi — 2 .X 2 + 3x3 = 8 


Solution 


Therefore, 




1 

0 

2 



6 

0 

2 


A = 


3 

4 

6 

, 

Ai = 

30 

4 

6 

, 



1 

-2 

3 



8 

-2 

3 



' 

1 

6 

2" 



1 

0 

6" 


a 2 = 


3 

30 

6 

, 

a 3 = 

-3 

4 

30 




1 

8 

3 



-1 

-2 

8 


det(Ai) 


-40 


10 

x 2 = 

det(A 2 ) 

72 

18 

det(A) 


44 

11 

det(A) 

44 

“ 11’ 




x 3 = 

det(A 3 ) 

152 

38 






det(A) 

44 

“ 11 




In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix 
A. We conclude this section by merging Theorem 2.3.3 with that list to produce the 
following theorem that relates all of the major topics we have studied thus far. 
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OPTIONAL 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

( d ) A can be expressed as a product of elementary matrices. 

(e) Ax — b is consistent for every n x 1 matrix b. 

(/) Ax — b has exactly one solution for every n x 1 matrix b. 

(g) det(A) ^ 0. 


We now have all of the machinery necessary to prove the following two results, which we 
stated without proof in Theorem 1.7.1: 

Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries 
are all nonzero. 

Theorem 1.7.1(d) The inverse of an invertible lower triangular matrix is lower trian- 
gular, and the inverse of an invertible upper triangular matrix is upper triangular. 


Proof of Theorem 1.7.1(c) Let A = [fly] be a triangular matrix, so that its diagonal 
entries are 

flll. fl22, • • • , O-nn 

From Theorem 2.1.2, the matrix A is invertible if and only if 


det(A) = a u a 2 2 - ■ ■ a nn 

is nonzero, which is true if and only if the diagonal entries are all nonzero. 


Proof of Theorem 1.7.1(d) We will prove the result for upper triangular matrices and 
leave the lower triangular case for you. Assume that A is upper triangular and invertible. 
Since 


A" 1 


1 

det(A) 


adj(A) 


we can prove that A -1 is upper triangular by showing that adj(A) is upper triangular or, 
equivalently, that the matrix of cofactors is lower triangular. We can do this by showing 
that every cofactor Cy with i < j (i.e., above the main diagonal) is zero. Since 


Cij = (-1 ) i+j M ij 


it suffices to show that each minor My with i < j is zero. For this purpose, let By be the 
matrix that results when the / th row and yth column of A are deleted, so 


Mtj = det (fly) (10) 

From the assumption that i < j, it follows that By is upper triangular (see Figure 1 .7.1). 
Since A is upper triangular, its ( i + l)-st row begins with at least i zeros. But the i th row 
of Bij is the ( i + l)-st row of A with the entry in the /th column removed. Since i < j, 
none of the first i zeros is removed by deleting the /th column; thus the / th row of By- 
starts with at least i zeros, which implies that this row has a zero on the main diagonal. 
It now follows from Theorem 2.1.2 that det(By) = 0 and from (10) that My = 0. 
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Exercise Set 2.3 

In Exercises 1-4, verify that det(fcA) = k" det(A). 


1. A = 


3. A = 


4. A = 


-1 

3 

'2 

3 

1 

'l 

0 

0 


k = 2 


2. A = 


'2 2 ' 

5 -2 


In Exercises 19-23, decide whether the matrix is invertible, and 
if so, use the adjoint method to find its inverse. 


k = - 4 


3 

1 

5 

f 

3 

-2 


k = - 2 


k = 3 


In Exercises 5-6, verify that det(AB) = det(BA) and deter- 
mine whether the equality det(A + B) = det(A) + det(B) holds. 


19. A = 


21. A = 


23. A = 


2 5 

-1 -1 
2 4 


-3 5 

1 -3 

0 2 

3 1 1 

5 2 2 

3 8 9 

3 2 2 


20. A = 


22. A = 


2 

0 

-2 


-5 


3 

2 

-4 



'2 

1 

o' 


'l 

-1 

3" 

5. A = 

3 

4 

0 

and B — 

7 

1 

2 


0 

0 

2 


5 

0 

1 


In Exercises 24-29, solve by Cramer’s rule, where it applies. 



'-1 

8 

2 


'2 

-1 

-4" 

II 

vd 

1 

0 

-1 

and B — 

1 

1 

3 


-2 

2 

2 


0 

3 

-1 


In Exercises , use determinants to decide whether the given 
matrix is invertible. 

2 5 5l [ 2 0 3' 

7. A = —1 — 1 0 8. A = 0 3 2 

2 4 3 -2 0 -4 


9. A = 


11. A = 


-3 5 

1 -3 


10. A = 


-3 

5 


24. lx\ — 1%2 — 3 
3xi + X 2 = 5 


26. x — 4 y + z = 6 

4x — y + 2z — — 1 
2x + 2y — 3 z = —20 

28. — x i — 4.^2 ~b 2x3 ~b -t"4 = 
2 .Xl — X 2 + 7 x 3 + 9 x ‘4 = 
— X\ “b X2 “b 3X3 T .X4 

. \ — 2 X 2 “b X3 — 4.X4 = 

29. 3xi — x'2 + X3 = 4 
— Xi + 7 x 2 — 2x 3 = 1 
2 .xi + 6x2 — X3 = 5 


25. 4x + 5v =2 
1 lx + y + 2z = 3 
x + 5y + 2z — 1 


27. X\ — 3x2 ~b 

2xi — X 2 


X 3 = 


4xi 


— 3X3 


4 

-2 

0 


-32 

14 

11 

-4 


0 

0 

2 


8 

0 

3 

30. Show that the matrix 









0 



cos 6 

sinf? 

0‘ 

4 

2 

1 

8' 

12. A = 

'l 

-l" 

A = 

— sin$ 

cos 9 

0 

-1 

-2 

-4 

9 

4 


0 

0 

1 




3 

1 

6 


8 

9 

-1 







2 

0 

o' 


' V2 

-V7 

0 

13. A = 

8 

1 

0 

14. A = 

3x/2 

— 3x/7 

0 


-5 

3 

6 


5 

-9 

0 


In Exercises 18, find the values of k for which the matrix A 
is invertible. 


15. A = 

~k 

-3 

-2 

k- 

-2 ' 
- 2_ 

II 

^6 

k 

2 

1 1 



'l 

2 

4' 



'l 

2 

o' 

II 

3 

1 

6 


II 

06 

k 

1 

k 


k 

3 

2 



0 

2 

1 


is invertible for all values of (9; then find A 1 using Theo- 
rem 2.3.6. 

31. Use Cramer’s rule to solve for y without solving for the un- 
knowns x, z, and w. 

4x + y + z + w = 6 

3x + 7 y — z + w = 1 

lx + 3y — 5z + 8u> = — 3 

x+y-bz + 2it>= 3 

32. Let Ax = b be the system in Exercise 3 1 . 

(a) Solve by Cramer’s rule. 

(b) Solve by Gauss-Jordan elimination. 

(c) Which method involves fewer computations? 
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33. Let 


A = 


a b c 
d e f 

J h 1 

Assuming that det(A) = — 7, find 

(a) det(3A) (b) det(A-‘) 


(c) det(2A~‘) 


(d) det((2A)->) 


(e) det 


a g d 
b h e 
c i f 


34. In each part, find the determinant given that A is a 4 x 4 ma- 
trix for which det(A) = —2. 


(a) det(-A) (b) det(A-‘) (c) det(2A r ) (d) det(A 3 ) 


(f ) For every n x n matrix A, we have 

A • adj(A) = (det(A))/„ 

(g) If A is a square matrix and the linear system Ax = 0 has mul- 
tiple solutions for x, then det(A) = 0. 

(h) If A is an n x n matrix and there exists an n x 1 matrix b 
such that the linear system Ax = b has no solutions, then the 
reduced row echelon form of A cannot be /„ . 

(i) If E is an elementary matrix, then £x = 0 has only the trivial 
solution. 

(j) If A is an invertible matrix, then the linear system Ax = 0 
has only the trivial solution if and only if the linear system 
A _1 x = 0 has only the trivial solution. 


35. In each part, find the determinant given that A is a 3 x 3 ma- 
trix for which det(A) = 7. 

(a) det (3 A) (b) det(A-‘) 

(c) det(2A-‘) (d) det((2A)-‘) 


(k) If A is invertible, then adj(A) must also be invertible. 

(l) If A has a row of zeros, then so does adj(A). 

Working with Technology 

Tl. Consider the matrix 


Working with Proofs 

36. Prove that a square matrix A is invertible if and only if ATt is 
invertible. 

37. Prove that if A is a square matrix, then detCA 7 ^) = det(AA r ). 

38. Let Ax = b be a system of n linear equations in n unknowns 
with integer coefficients and integer constants. Prove that if 
det(A) = 1, the solution x has integer entries. 

39. Prove that if det(A) = 1 and all the entries in A are integers, 
then all the entries in A -1 are integers. 


A = 


1 

1 


1 

1+6 


in which e > 0. Since det(A) = 6 5 ^ 0, it follows from The- 
orem 2.3.8 that A is invertible. Compute det(A) for various 
small nonzero values of 6 until you find a value that produces 
det(A) = 0, thereby leading you to conclude erroneously that A 
is not invertible. Discuss the cause of this. 


T2. We know from Exercise 39 that if A is a square matrix then 
detfATl) = det(AA r ). By experimenting, make a conjecture as 
to whether this is true if A is not square. 


True-False Exercises 

TF. In parts (a)-(l) determine whether the statement is true or 
false, and justify your answer. 

(a) If A is a 3 x 3 matrix, then det(2A) = 2det(A). 

(b) If A and B are square matrices of the same size such that 
det(A) = det(B), then det(A + B) = 2det(A). 

(c) If A and B are square matrices of the same size and A is in- 
vertible, then 

det(A _ 1 BA) = det(B) 

(d) A square matrix A is invertible if and only if det(A) = 0. 

(e) The matrix of cofactors of A is precisely [adj(A)] r . 


T3. The French mathematician Jacques Hadamard (1865-1963) 
proved that if A is an n x n matrix each of whose entries satisfies 
the condition |a, ; | < M, then 

| det(A) | < 

( Hadamard’s inequality ). For the following matrix A, use this re- 
sult to find an interval of possible values for det(A), and then 
use your technology utility to show that the value of det(A) falls 
within this interval. 


-0.3 

-2.4 

-1.7 

2.5- 

0.2 

-0.3 

- 1.2 

1.4 

2.5 

2.3 

0.0 

1.8 

.1.7 

1.0 

- 2.1 

2.3. 
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Chapter 2 Supplementary Exercises 

In Exercises 1-8, evaluate the determinant of the given matrix 
by (a) cofactor expansion and fb) using elementary row operations 
to introduce zeros into the matrix. 


1. 


3. 


5. 


-4 2 
3 3 

'-1 
0 
-3 


5 2 

2 -1 

1 1 


7. 


3 

1 

0 

3 

-2 

1 


0 -1 
1 1 

4 2 

6 0 

3 1 

0 -1 
2 -2 


2 . 


4. 


6 . 


8 . 


7 -1 

-2 -6 


-1 -2 
-4 
-7 


-3 
-5 -6 

-8 -9 


-5 

3 

1 


1 

0 

-2 


-1 -2 -3 -4 

4 3 2 1 

12 3 4 

-4 -3 -2 -1 


9. Evaluate the determinants in Exercises 3-6 by using the arrow 
technique (see Example 7 in Section 2.1). 

10. (a) Construct a 4 x 4 matrix whose determinant is easy to 

compute using cofactor expansion but hard to evaluate 
using elementary row operations. 

(b) Construct a 4 x 4 matrix whose determinant is easy to 
compute using elementary row operations but hard to 
evaluate using cofactor expansion. 

11. Use the determinant to decide whether the matrices in Exer- 
cises 1-4 are invertible. 

12. Use the determinant to decide whether the matrices in Exer- 
cises 5-8 are invertible. 

In Exercises , find the given determinant by any me- 
thod. 


13. 


5 b - 3 
b- 2 -3 


14. 


-4 

1 

a — 1 


15 . 


0 

0 

0 

0 

-3 


0 

0 

0 

-4 

0 


0 

0 

-1 

0 

0 


0 

2 

0 

0 

0 


5 

0 

0 

0 

0 


Solve for x 







x 

-1 





3 1 

— X 



1 0 -3 

2 x —6 

1 3 x - 5 


In Exercises 17-24, use the adjoint method (Theorem 2.3.6) to 
find the inverse of the given matrix, if it exists. 

17. The matrix in Exercise 1. 18. The matrix in Exercise 2. 

19. The matrix in Exercise 3. 20. The matrix in Exercise 4. 

21. The matrix in Exercise 5. 22. The matrix in Exercise 6. 

23. The matrix in Exercise 7. 24. The matrix in Exercise 8. 

25. Use Cramer’s rule to solve for x' and y' in terms of x and y. 

x=\x'-\y' 

y = \x' + §/ 

26. Use Cramer’s rule to solve for x' and y' in terms of x and y. 

x = x' cos 6 — y' sin 6 
y = x' sin 9 + y' cos 8 

27. By examining the determinant of the coefficient matrix, show 
that the following system has a nontrivial solution if and only 
if or = jS. 

x + y + az = 0 
x + y + fiz = 0 
ax + f)y + z = 0 

28. Let A be a 3 x 3 matrix, each of whose entries is 1 or 0. What 
is the largest possible value for det(A)? 

29. (a) For the triangle in the accompanying figure, use trigonom- 

etry to show that 

b cos y + c cos fi = a 
c cos or + a cosy = b 
a cos fi + b cos a = c 

and then apply Cramer’s rule to show that 

b 2 + c 2 - a 2 

cos a = 

2 be 

(b) Use Cramer’s rule to obtain similar formulas for cos fi and 
cos y . 



◄ Figure Ex-29 

30. Use determinants to show that for all real values of X, the only 
solution of 

x — 2y = Xx 
x — y = Xy 

is x = 0, y = 0. 

31. Prove: If A is invertible, then adj(A) is invertible and 

1 


[adj(A)]-' = 


det(A) 


A = adj(A-‘) 


130 Determinants 


32. Prove: If A is an n x n matrix, then 

det[adj(A)] = [det(A)]"" 1 

33. Prove: If the entries in each row of an n x n matrix A add up 
to zero, then the determinant of A is zero. [Hint: Consider 
the product Ax, where x is the nxl matrix, each of whose 
entries is one.] 



◄ Figure Ex-34 


34. (a) In the accompanying figure, the area of the triangle ABC 
can be expressed as 

area ABC = area A DEC + area CEFB — area ADFB 

Use this and the fact that the area of a trapezoid equals 
| the altitude times the sum of the parallel sides to show 
that 


1 

area ABC = - 
2 

Xl 

yi 

l 

X2 

yi 

l 


X 3 

y-i 

l 


[Note: In the derivation of this formula, the vertices are 
labeled such that the triangle is traced counterclockwise 
proceeding from (xi,yi) to fe, w) to (x 3 ,y 3 ). For a 
clockwise orientation, the determinant above yields the 
negative of the area.] 

(b) Use the result in (a) to find the area of the triangle with 
vertices (3, 3), (4. 0), (—2, —1). 


35. Use the fact that 

21375, 38798, 34162, 40223, 79154 
are all divisible by 19 to show that 

2 13 7 5 

3 8 7 9 8 

3 4 16 2 

4 0 2 2 3 

7 9 15 4 

is divisible by 19 without directly evaluating the determinant. 

36. Without directly evaluating the determinant, show that 


sin a 

COSO' 

sin(o! + S) 

sin P 

cos ft 

sin(y0 + S) 

sin y 

cos y 

sin(y + S) 
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INTRODUCTION Engineers and physicists distinguish between two types of physical quantities — 

scalars, which are quantities that can be described by a numerical value alone, and 
vectors, which are quantities that require both a number and a direction for their 
complete physical description. For example, temperature, length, and speed are scalars 
because they can be fully described by a number that tells “how much” — a temperature 
of 20° C, a length of 5 cm, or a speed of 75 km/h. In contrast, velocity and force are 
vectors because they require a number that tells “how much” and a direction that tells 
“which way” — say, a boat moving at 10 knots in a direction 45° northeast, or a force of 
100 lb acting vertically. Although the notions of vectors and scalars that we will study 
in this text have their origins in physics and engineering, we will be more concerned 
with using them to build mathematical structures and then applying those structures to 
such diverse fields as genetics, computer science, economics, telecommunications, and 
environmental science. 


3.1 Vectors in 2-Space, 3-Space, and n-Space 

Linear algebra is primarily concerned with two types of mathematical objects, “matrices” 
and “vectors.” In Chapter 1 we discussed the basic properties of matrices, we introduced 
the idea of viewing n -tuples of real numbers as vectors, and we denoted the set of all such 
n -tuples as R". In this section we will review the basic properties of vectors in two and three 
dimensions with the goal of extending these properties to vectors in R" . 


Geometric Vectors 



A Figure 3.1.1 


Engineers and physicists represent vectors in two dimensions (also called 2-space ) or 
in three dimensions (also called 3-space ) by arrows. The direction of the arrowhead 
specifies the direction of the vector and the length of the arrow specifies the magnitude. 
Mathematicians call these geometric vectors. The tail of the arrow is called the initial 
point of the vector and the tip the terminal point (Figure 3.1.1). 

In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we 
will denote scalars in lowercase italic type such as a, k, v, w, and x. When we want 
to indicate that a vector v has initial point A and terminal point B, then, as shown in 
Figure 3.1.2, we will write 

\ = AB 
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▲ Figure 3.1.2 


Vectors with the same length and direction, such as those in Figure 3.1.3, are said to 
be equivalent. Since we want a vector to be determined solely by its length and direction, 
equivalent vectors are regarded as the same vector even though they may be in different 
positions. Equivalent vectors are also said to be equal, which we indicate by writing 

v = w 

The vector whose initial and terminal points coincide has length zero, so we call this 
the zero vector and denote it by 0. The zero vector has no natural direction, so we will 
agree that it can be assigned any direction that is convenient for the problem at hand. 


Vector Addition There are a number of important algebraic operations on vectors, all of which have their 
origin in laws of physics. 



Equivalent vectors 

▲ Figure 3.1.3 


Parallelogram Rule for Vector Addition If v and w are vectors in 2-space or 3-space 
that are positioned so their initial points coincide, then the two vectors form adjacent 
sides of a parallelogram, and the sum v + w is the vector represented by the arrow 
from the common initial point of v and w to the opposite vertex of the parallelogram 
(Figure 3.1.4a). 


Here is another way to form the sum of two vectors. 


Triangle Rule for Vector Addition If v and w are vectors in 2-space or 3-space that are 
positioned so the initial point of w is at the terminal point of v, then the sum v + w 
is represented by the arrow from the initial point of v to the terminal point of w 
(Figure 3.1.46). 


In Figure 3.1.4c we have constructed the sums v + w and w + v by the triangle rule. 
This construction makes it evident that 

v + w = w + v (1) 

and that the sum obtained by the triangle rule is the same as the sum obtained by the 
parallelogram rule. 





(a) (b) (c) 

Vector addition can also be viewed as a process of translating points. 


Vector Addition Viewed asTranslatiort If v, w, and v + w are positioned so their initial 
points coincide, then the terminal point of v + w can be viewed in two ways: 

The terminal point of v + w is the point that results when the terminal point 
of v is translated in the direction of w by a distance equal to the length of w 
(Figure 3.1.5a). 

2 The terminal point of v + w is the point that results when the terminal point 
of w is translated in the direction of v by a distance equal to the length of v 
(Figure 3.1.56). 

Accordingly, we say that v + w is the translation of v by w or, alternatively, the 
translation of w by v. 


3.1 Vectors in 2-Space, 3-Space, and n-Space 133 




Vector Subtraction In ordinary arithmetic we can write a — b — a + (—6), which expresses subtraction in 
terms of addition. There is an analogous idea in vector arithmetic. 


Vector Subtraction The negative of a vector v, denoted by — v, is the vector that has 
the same length as v but is oppositely directed (Figure 3.1.6a), and the difference of x 
from w, denoted by w — v, is taken to be the sum 

w — v = w + (— v) (2) 


The difference of v from w can be obtained geometrically by the parallelogram 
method shown in Figure 3.1.66, or more directly by positioning w and v so their ini- 
tial points coincide and drawing the vector from the terminal point of v to the terminal 
point of w (Figure 3.1.6c). 





Scalar Multiplication Sometimes there is a need to change the length of a vector or change its length and 
reverse its direction. This is accomplished by a type of multiplication in which vectors 
are multiplied by scalars. As an example, the product 2v denotes the vector that has the 
same direction as v but twice the length, and the product — 2v denotes the vector that is 
oppositely directed to v and has twice the length. Here is the general result. 




Scalar Multiplication If v is a nonzero vector in 2-space or 3-space, and if k is a 
nonzero scalar, then we define the scalar product of v by k to be the vector whose 
length is | A: | times the length of v and whose direction is the same as that of v if k is 
positive and opposite to that of v if k is negative. If k = 0 or v = 0, then we define kx 
to be 0. 


Figure 3. 1 .7 shows the geometric relationship between a vector v and some of its scalar 
multiples. In particular, observe that (— l)v has the same length as v but is oppositely 
directed; therefore, 

(-l)v = -v (3) 


Parallel and Collinear Suppose that v and w are vectors in 2-space or 3-space with a common initial point. If 
Vectors one of the vectors is a scalar multiple of the other, then the vectors lie on a common line, 
so it is reasonable to say that they are collinear (Figure 3.1.8a). However, if we trans- 
late one of the vectors, as indicated in Figure 3.1.86, then the vectors are parallel but 
no longer collinear. This creates a linguistic problem because translating a vector does 
not change it. The only way to resolve this problem is to agree that the terms parallel and 


134 Cha|t Euclidean Vector Spaces 

collinear mean the same thing when applied to vectors. Although the vector 0 has no 
clearly defined direction, we will regard it as parallel to all vectors when convenient. 




Sums of Three or More Vector addition satisfies the associative law for addition , meaning that when we add three 
Vectors vectors, say u, v, and w, it does not matter which two we add first; that is, 

u + (v + w) = (u + v) + w 

It follows from this that there is no ambiguity in the expression u + v + w because the 
same result is obtained no matter how the vectors are grouped. 

A simple way to construct u + v + w is to place the vectors “tip to tail" in succession 
and then draw the vector from the initial point of u to the terminal point of w (Figure 
3.1.9a). The tip-to-tail method also works for four or more vectors (Figure 3.1.9 b). 
The tip-to-tail method makes it evident that if u, v, and w are vectors in 3-space with a 
common initial point, then u + v + w is the diagonal of the parallelepiped that has the 
three vectors as adjacent sides (Figure 3.1.9c). 




Vectors in Coordinate 
Systems 


The component forms of the 
zero vector are 0 = (0, 0) in 
2-space and 0 = (0, 0, 0) in 3- 
space. 


Up until now we have discussed vectors without reference to a coordinate system. How- 
ever, as we will soon see, computations with vectors are much simpler to perform if a 
coordinate system is present to work with. 

If a vector v in 2-space or 3-space is positioned with its initial point at the origin of 
a rectangular coordinate system, then the vector is completely determined by the coor- 
dinates of its terminal point (Figure 3.1.10). We call these coordinates the components 
of v relative to the coordinate system. We will write v = (vi, v 2 ) to denote a vector v in 
2-space with components (r>i, v 2 ), and v = (ui, v 2 , v 2 ) to denote a vector v in 3-space 
with components (i>i, v 2 , 1 ) 3 ). 




► Figure 3.1.10 
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y 


■f <'V r 2 ) 



▲ Figure 3.1.11 The ordered 
pair (ui , v 2 ) can represent a 
point or a vector. 

Vectors Whose Initial Point 
Is Not at the Origin 


Pi ( x 2> yi) 



V = P \P 2 - op 2 - OP l 
▲ Figure 3.1.12 


n-Space 


It should be evident geometrically that two vectors in 2-space or 3-space are equiv- 
alent if and only if they have the same terminal point when their initial points are at 
the origin. Algebraically, this means that two vectors are equivalent if and only if their 
corresponding components are equal. Thus, for example, the vectors 

v = (u; , t> 2 , Vj) and w = (u>i, w 2 , w 2 ) 

in 3-space are equivalent if and only if 

t>i = W\, V2 = U> 2 , V3 = w 3 

Remark It may have occurred to you that an ordered pair (ui, v 2 ) can represent either a vector 
with components V\ and v 2 or a point with coordinates Vi and v 2 (and similarly for ordered triples). 
Both are valid geometric interpretations, so the appropriate choice will depend on the geometric 
viewpoint that we want to emphasize (Figure 3.1.11). 

It is sometimes necessary to consider vectors whose initial points are not at the origin. 
>- 

If P t P 2 denotes the vector with initial point P\(x\, y{) and terminal point P 2 (x 2 , yi ), 
then the components of this vector are given by the formula 


P\ Pi = (x 2 -x u y 2 -yi) (4) 

That is, the components of Pi P 2 are obtained by subtracting the coordinates of the 
initial point from the coordinates of the terminal point. For example, in Figure 3.1.12 
the vector P\ P 2 is the difference of vectors OP 2 and OP\ , so 

P 1 P 2 = OP 2 - OPi = (x 2 , y 2 ) - (xu yi) = (x 2 - x u y 2 - yi) 

As you might expect, the components of a vector in 3-space that has initial point 
Piix u yi,Zi) and terminal point P 2 (x 2 , y 2 , z 2 ) are given by 


P 1 P 2 = (x 2 -xi, y 2 - yi,z 2 - zi) (5) 


► EXAMPLE 1 Finding the Components of a Vector 

> 

The components of the vector v = P\P 2 with initial point Pi (2, —1,4) and terminal 
point P 2 (1, 5, —8) are 

v = (7 - 2, 5 - (-1), (-8) - 4) = (5, 6, -12) ◄ 

The idea of using ordered pairs and triples of real numbers to represent points in two- 
dimensional space and three-dimensional space was well known in the eighteenth and 
nineteenth centuries. By the dawn of the twentieth century, mathematicians and physi- 
cists were exploring the use of “higher dimensional” spaces in mathematics and physics. 
Today, even the layman is familiar with the notion of time as a fourth dimension, an idea 
used by Albert Einstein in developing the general theory of relativity. Today, physicists 
working in the field of “string theory” commonly use 1 1 -dimensional space in their quest 
for a unified theory that will explain how the fundamental forces of nature work. Much 
of the remaining work in this section is concerned with extending the notion of space to 
n dimensions. 

To explore these ideas further, we start with some terminology and notation. The 
set of all real numbers can be viewed geometrically as a line. It is called the real line and 
is denoted by R or R 1 . The superscript reinforces the intuitive idea that a line is one- 
dimensional. The set of all ordered pairs of real numbers (called 2 -tuples) and the set of all 
ordered triples of real numbers (called 3-tuples) are denoted by R 2 and P 3 , respectively. 
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The superscript reinforces the idea that the ordered pairs correspond to points in the 
plane (two-dimensional) and ordered triples to points in space (three-dimensional). The 
following definition extends this idea. 


DEFINITION 1 If n is a positive integer, then an ordered n-tuple is a sequence of n 
real numbers (iq, v 2 , , v„). The set of all ordered n-tuples is called n-space and is 

denoted by R" . 


Remark You can think of the numbers in an zt-tuple (tq , v 2 , ■ ■ ■ , v„) as either the coordinates of 
a generalized point or the components of a generalized vector, depending on the geometric image 
you want to bring to mind — the choice makes no difference mathematically, since it is the algebraic 
properties of n -tuples that are of concern. 

Here are some typical applications that lead to n -tuples. 

Experimental Data — A scientist performs an experiment and makes n numerical 
measurements each time the experiment is performed. The result of each experiment 
can be regarded as a vector y = (y 1; y 2 , ...,>'„) in R n in which y\, y 2 , . . . , y„ are 
the measured values. 

Storage and Warehousing — A national trucking company has 1 5 depots for storing 
and servicing its trucks. At each point in time the distribution of trucks in the service 
depots can be described by a 15-tuple x = (xi, x 2 , . . . , X 15 ) in which x\ is the number 
of trucks in the first depot, x 2 is the number in the second depot, and so forth. 
Electrical Circuits — A certain kind of processing chip is designed to receive four 
input voltages and produce three output voltages in response. The input voltages 
can be regarded as vectors in R 4 and the output voltages as vectors in R 3 . Thus, the 
chip can be viewed as a device that transforms an input vector v = (v\,v 2 ,v 3 , 14 ) in 
R 4 into an output vector w = (uq, w 2 , w 3 ) in R 3 . 

Graphical Images — One way in which color images are created on computer screens 
is by assigning each pixel (an addressable point on the screen) three numbers that 
describe the hue, saturation, and brightness of the pixel. Thus, a complete color image 
can be viewed as a set of 5-tuples of the form v = (x,y, h,s,b) in which x and y are 
the screen coordinates of a pixel and li, s, and b are its hue, saturation, and brightness. 
Economics — One approach to economic analysis is to divide an economy into sectors 
(manufacturing, services, utilities, and so forth) and measure the output of each sector 
by a dollar value. Thus, in an economy with 10 sectors the economic output of the 
entire economy can be represented by a 10-tuple s = (si, s 2 , . . . , .S' 10 ) in which the 
numbers .q, s 2 , . . . , q 0 are the outputs of the individual sectors. 



Albert Einstein 
(1879-1955) 


The German-born physicist Albert Einstein 
immigrated to the United States in 1935, where he settled at 
Princeton University. Einstein spent the last three decades 
of his life working unsuccessfully at producing a unified field 
theory that would establish an underlying link between the 
forces of gravity and electromagnetism. Recently, physi- 
cists have made progress on the problem using a frame- 
work known as string theory. In this theory the smallest, 
indivisible components of the Universe are not particles but 
loops that behave like vibrating strings. Whereas Einstein's 
space-time universe was four-dimensional, strings reside in 
an 11-dimensional world that is the focus of current re- 
search. 

[Image: © Bettmann/CORBIS ] 
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Operations on Vectors in R n 


Mechanical Systems — Suppose that six particles move along the same coordinate 
line so that at time t their coordinates are x\, x 2 , . . . , X(, and their velocities are 
i>i, ib, . . . , Ve, respectively. This information can be represented by the vector 

V = (X U X 2 ,X 3 ,X 4 ,X 5 ,X 6 , V \ , V 2 , l>3, l>4, V 5 , V 6 , t ) 

in R 13 . This vector is called the state of the particle system at time 1 . 


Our next goal is to define useful operations on vectors in R n . These operations will all 
be natural extensions of the familiar operations on vectors in R 2 and R 3 . We will denote 
a vector v in R 1 ' using the notation 

v = Oi, v 2 v n ) 

and we will call 0 = (0, 0, . . . , 0) the zero vector. 

We noted earlier that in R 2 and R 3 two vectors are equivalent (equal) if and only if 
their corresponding components are the same. Thus, we make the following definition. 


DEFINITION 2 Vectors v = (v\, v 2 , . . . , v n ) and w = (uq, w 2 , . ■ ■ , w„) in R n are said 
to be equivalent (also called equal) if 

Vi = W\, v 2 = w 2 , . . . , v n — w n 
We indicate this by writing v = w. 


► EXAMPLE 2 Equality of Vectors 

(a, b , c, d) = (1, —4, 2, 7) 
if and only if a — 1, b = —4, c — 2, and <7 = 7. 

Our next objective is to define the operations of addition, subtraction, and scalar 
multiplication for vectors in R n . To motivate these ideas, we will consider how these op- 
erations can be performed on vectors in R 2 using components. By studying Figure 3.1.13 
you should be able to deduce that if v = (ui , v 2 ) and w = (w\ , w 2 ), then 

v + w = (ui + wi, v 2 + w 2 ) (6) 

kx=(kvi,kv 2 ) (7) 

In particular, it follows from (7) that 

-v = (-l)v = (~v u -v 2 ) (8) 




► Figure 3.1.13 
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and hence that 


w — v = w + (— v) = (u>i — V\, 11)2 — V 2 ) 


Motivated by Formulas (6)-(9), we make the following definition. 


(9) 


DEFINITION 3 If v = (ui, v 2 v„) and w = (w\, w 2 , ■ 

and if k is any scalar, then we define 

. , vo n ) are vectors in R n , 

v + w = (ill + U>1 , v 2 + w 2 v n + U) n ) 

(10) 

k\ = (kv 1, kv 2, .... kv n ) 

(ID 

-v = (-Wi, -v 2 , -v n ) 

(12) 

W — V = W + (— V) = (w 1 — Vl, W 2 — v 2 , . . 

, W„ - V n ) (13) 


In words, vectors are added (or 
subtracted) by adding (or sub- 
tracting) their corresponding 
components, and a vector is 
multiplied by a scalar by multi- 
plying each component by that 
scalar. 


► EXAMPLE 3 Algebraic Operations Using Components 

If v = (1, —3, 2) and w = (4, 2, 1), then 

v + w = (5, — 1, 3), 2v = (2, — 6, 4) 

— w = (— 4, — 2, — 1), v — w = v + (— w) = (— 3, — 5, 1) M 


The following theorem summarizes the most important properties of vector opera- 
tions. 


If u, v, and w are vectors in R " , and if k and m are scalars , then : 
(a) u + v = v + u 
(. b ) (u + v) + w = u + (v + w) 

(c) u + 0 = 0+ u = u 

(d) u + (— u) = 0 

(e) k( u + v) = ku + k\ 

(/) (k + m) u = ku + mu 

( g ) k(m u) = (km) u 

(h) lu = u 


We will prove part ( b ) and leave some of the other proofs as exercises. 


Proof ( b ) Let u = (ui, U 2 , ■ ■ ■ , u n ), v = (v\, v 2 , ■ ■ ■ , v n ), and w = (wi, W 2 , . . . , w„). 
Then 


(u+ v) + W = ((Ml, U 2 , Un) + (Vu V 2 , . . . , V n )) + (w W„) 

= (m 1 + Vl, U2+V2,..., Un + V„) + (lUi, U)2, . ■ ■ , W„) 

= ((Ml + l>l) + W 1, (M2 + V2) + W 2, . . . , (u n + V„) + W n ) 

= (u\ + (ill + W 1), U 2 + (v 2 + W 2 ), ...,U n + (V n + W n )] 

= (Ml, M 2 , . . . , U n ) + (Ui + W 1, V2 + W2,..., V n + W„) 

= u + (v + w) 


[Vector addition] 
[Vector addition) 
[Regroup] 
[Vector addition] 


The following additional properties of vectors in R " can be deduced easily by ex- 
pressing the vectors in terms of components (verify). 



3.1 Vectors in 2-Space, 3-Space, and n-Space 139 


Calculating Without 
Components 


Linear Combinations 


Note that this definition of a 
linear combination is consis- 
tent with that given in the con- 
text of matrices (see Definition 
6 in Section 1.3). 


Alternative Notations for 
Vectors 


REM 3.1.2 If v is a vector in R n and k is a scalar , then: 

(a) 0v = 0 

(b) £0 = 0 

(c) (-l)v=-v 


One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow cal- 
culations to be performed without expressing the vectors in terms of components. For 
example, suppose that x, a, and b are vectors in R n , and we want to solve the vector 
equation x + a = b for the vector x without using components. We could proceed as 
follows: 

X + a = b |Given| 

(x + a) + ( — a) = b + ( — a) I Add the negative of a to both sides] 

X + (a + (-a)) = b - a |Part (6) of Theorem 3.1.1 1 

X + 0 = b - a |Part (rf) of Theorem 3.1.1 1 

X = b — a | Part (c) of Theorem 3.1.1] 

While this method is obviously more cumbersome than computing with components in 
R" , it will become important later in the text where we will encounter more general kinds 
of vectors. 

Addition, subtraction, and scalar multiplication are frequently used in combination to 
form new vectors. For example, if vi, v 2 , and V3 are vectors in R n . then the vectors 

u = 2vi + 3v 2 + V3 and w = 7vi — 6v 2 + 8V3 

are formed in this way. In general, we make the following definition. 


DEFINITION 4 If w is a vector in R", then w is said to be a linear combination of the 
vectors Vi , v 2 , . . . , v r in R " if it can be expressed in the form 

w = k\ Vi + £ 2 v 2 + • • • + k r \ r ( 14) 

where k\,k 2 , ... , k r are scalars. These scalars are called the coefficients of the linear 
combination. In the case where r — 1, Formula (14) becomes w = £iVi, so that a 
linear combination of a single vector is just a scalar multiple of that vector. 


Up to now we have been writing vectors in R" using the notation 

v= (vi,v 2 , ...,v n ) (15) 

We call this the comma-delimited form. However, since a vector in R n is just a list of 
its n components in a specific order, any notation that displays those components in the 
correct order is a valid way of representing the vector. For example, the vector in (15) 
can be written as 

v=[ui v 2 ■■■ v„] (16) 

which is called row-vector form, or as 


U 

v 2 


(17) 
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which is called column-vector form. The choice of notation is often a matter of taste or 
convenience, but sometimes the nature of a problem will suggest a preferred notation. 
Notations (15), (16), and (17) will all be used at various places in this text. 


Application of Linear Combinations to Color Models 

Colors on computer monitors are commonly based on what is called 
the RGB color model. Colors in this system are created by adding 
together percentages of the primary colors red (R), green (G), and 
blue (B). One way to do this is to identify the primary colors with 
the vectors 

r= (1,0,0) (pure red), 
g = (0, 1, 0) (pure green), 
b = (0, 0, 1) (pure blue) 

in R 3 and to create all other colors by forming linear combinations 
of r, g, and b using coefficients between 0 and 1 , inclusive; these 
coefficients represent the percentage of each pure color in the mix. 


The set of all such color vectors is called RGB space or the RGB 
color cube (Figure 3.1.14). Thus, each color vector c in this cube is 
expressible as a linear combination of the form 

c = kir + k 2 g + kjb 
= *i(l, 0, 0) + * 2 (0, l,0) + <fc 3 (0,0, 1) 

= (* 1 ,* 2 ,* 3 ) 

where 0 < k t < 1 . As indicated in the figure, the corners of the cube 
represent the pure primary colors together with the colors black, 
white, magenta, cyan, and yellow. The vectors along the diagonal 
running from black to white correspond to shades of gray. 


► Figure 3.1.14 


Blue 
(0, 0, 1) 


Magenta 

( 1 , 0 , 1 ) 

Black 

( 0 , 0 , 0 ) 

Red fa. 

(1,0, 0) 


Cyan 

( 0 , 1 , 1 ) 


White 

(1,1,1) 

Green 

( 0 , 1 , 0 ) 


Yellow 

(1,1,0) 


Exercise Set 3.1 

In Exercises 1-2, find the components of the vector. 



In Exercises 3-4, find the components of the vector P\ P 2 . 

3. (a) Pd 3, 5), P 2 ( 2, 8) (b) A (5, -2, 1), P 2 { 2, 4, 2) 

4. (a) Pd- 6, 2), P 2 (- 4, -1) (b) PdO, 0, 0), P 2 (-1, 6, 1) 

5. (a) Find the terminal point of the vector that is equivalent to 

u = (1,2) and whose initial point is A(l, 1). 

(b) Find the initial point of the vector that is equivalent to 
u = (1, 1, 3) and whose terminal point is B( — 1, —1, 2). 

6. (a) Find the initial point of the vector that is equivalent to 

u = (1,2) and whose terminal point is 5(2, 0). 

(b) Find the terminal point of the vector that is equivalent to 
u = (1, 1, 3) and whose initial point is A(0, 2, 0). 

7. Find an initial point P of a nonzero vector u = PQ with ter- 
minal point <2(3, 0, —5) and such that 

(a) u has the same direction as v = (4, —2, — 1). 

(b) u is oppositely directed to v = (4, —2, —1). 
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8. Find a terminal point Q of a nonzero vector u = PQ with 
initial point P(— 1. 3, —5) and such that 

(a) u has the same direction as v = (6, 7, —3). 

(b) u is oppositely directed to v = (6, 7, —3). 

9. Let u = (4, —1), v = (0, 5), and w = (—3, —3). Find the 
components of 

(a) u + w (b) v - 3u 

(c) 2(u - 5w) (d) 3v - 2(u + 2w) 

10. Let u= (-3,1,2), v=(4, 0, -8), and w = (6, -1,-4). 
Find the components of 

(a) v — w (b) 6u + 2v 

(c) — 3(v — 8w) (d) (2u — 7w) — (8v + u) 

11. Let u = (-3, 2, 1, 0), v = (4, 7, -3, 2), and 
w = (5, —2, 8, 1). Find the components of 

(a) v — w (b) — u + (v — 4w) 

(c) 6(u — 3v) (d) (6v — w) — (4u + v) 

12. Let u = (1, 2, -3, 5, 0), v = (0, 4. -1, 1,2), and 
w = (7, 1, —4, —2, 3). Find the components of 

(a) v + w (b) 3(2u — v) 

(c) (3u - v) - (2u + 4w) (d) \ (w - 5v + 2u) + v 

13. Let u, v, and w be the vectors in Exercise 1 1 . Find the com- 
ponents of the vector x that satisfies the equation 

3u + v — 2w = 3x + 2w. 

14. Let u, v, and w be the vectors in Exercise 12. Find the com- 
ponents of the vector x that satisfies the equation 

2u — v + x = 7x + w. 

15. Which of the following vectors in R 6 , if any, are parallel to 
u = (-2, 1,0, 3, 5, 1)? 

(a) (4, 2, 0, 6, 10, 2) 

(b) (4, -2,0, -6, -10, -2) 

(c) 10,0,0,0,0,0) 

16. For what value(s) of t , if any, is the given vector parallel to 
u= (4,-1)? 

(a) (8f, —2) (b)(8f,2t) (c)(l,f 2 ) 

17. Let u = (1, — 1, 3, 5) and v = (2, 1, 0, —3). Find scalars a and 
b so that flu + by = (1, —4, 9, 18). 

18. Let u = (2, 1, 0, 1. —1) and v = (—2, 3, 1, 0, 2). Find scalars 
a and b so that au + b\ = (—8, 8, 3, —1, 7). 

In Exercises 19-20, find scalars ci, c 2 , and c 3 for which the 
equation is satisfied. 

19. ci(l, -1, 0) + c 2 (3, 2, 1) + c 3 (0, 1,4) = (-1, 1, 19) 

20. ci (-1,0, 2) + c 2 (2, 2, —2) + c 3 (l, -2, 1) = (-6, 12,4) 


22. Show that there do not exist scalars Ci, c 2 , and c 3 such that 
d(l, 0, 1, 0) + c 2 (l, 0, -2, 1) + c 3 (2, 0, 1, 2) = (1, -2, 2, 3) 

23. Let P be the point (2, 3, —2) and Q the point (7, —4, 1). 

(a) Find the midpoint of the line segment connecting the 
points P and Q. 

(b) Find the point on the line segment connecting the points 
P and Q that is | of the way from P to Q. 

24. In relation to the points Pi and P 2 in Figure 3.1.12, what can 
you say about the terminal point of the following vector if its 
initial point is at the origin? 



25. In each part, find the components of the vector u + v + w. 




26. Referring to the vectors pictured in Exercise 25, find the com- 
ponents of the vector u — v + w. 

27. Let P be the point (1,3,7). If the point (4, 0, —6) is the mid- 
point of the line segment connecting P and Q , what is QP. 

28. If the sum of three vectors in R 3 is zero, must they lie in the 
same plane? Explain. 

29. Consider the regular hexagon shown in the accompanying fig- 
ure. 

(a) What is the sum of the six radial vectors that run from the 
center to the vertices? 

(b) How is the sum affected if each radial vector is multiplied 
by i? 

(c) What is the sum of the five radial vectors that remain if a 
is removed? 

(d) Discuss some variations and generalizations of the result 
in part (c). 


a 



◄ Figure Ex-29 


21. Show that there do not exist scalars C\ , c 2 , and c 3 such that 
ci(— 2, 9, 6) + c 2 (— 3, 2, 1) + c 3 (l, 7, 5) = (0, 5, 4) 


30. What is the sum of all radial vectors of a regular n -sided poly- 
gon? (See Exercise 29.) 
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Working with Proofs 

31. Prove parts (a), (c), and ( d ) of Theorem 3.1.1. 

32. Prove parts (e)-(h) of Theorem 3.1.1. 

33. Prove parts (u)-(c) of Theorem 3.1.2. 

True-False Exercises 

TF. In parts (a)-(k) determine whether the statement is true or 
false, and justify your answer. 

(a) Two equivalent vectors must have the same initial point. 

(b) The vectors (a, b) and (a, b, 0) are equivalent. 

(c) If k is a scalar and v is a vector, then v and k\ are parallel if 
and only if k > 0. 

(d) The vectors v + (u + w) and (w + v) + u are the same. 

(e) If u + v = u + w, then v = w. 


(f ) If a and b are scalars such that uu + by = 0, then u and v are 
parallel vectors. 

(g) Collinear vectors with the same length are equal. 

(h) If (a, b , c) + (x, y, z) = (x, y, z), then (a, b, c) must be the 
zero vector. 

(i) If k and m are scalars and u and v are vectors, then 

(k + m)( u + v) = ku + my 

(j) If the vectors v and w are given, then the vector equation 

3(2v - x) = 5x - 4w + v 
can be solved for x. 

(k) The linear combinations rqvi + a 2\2 and b i Vi + 62V2 can only 
be equal if a\ = b\ and a 2 = b 2 . 


3.2 Norm, Dot Product, and Distance in R n 

In this section we will be concerned with the notions of length and distance as they relate to 
vectors. We will first discuss these ideas in R 2 and R 3 and then extend them algebraically 
to R n . 


Norm of a Vector 



In this text we will denote the length of a vector v by the symbol ||v||, which is read as 
the norm of v, the length of v, or the magnitude of v (the term “norm” being a common 
mathematical synonym for length). As suggested in Figure 3.2.1a, it follows from the 
Theorem of Pythagoras that the norm of a vector (iq , iq) in R 2 is 

|| V || = Vvj + v\ (1) 

Similarly, for a vector (iq, tq> P 3 ) in R? , it follows from Figure 3.2.16 and two applica- 
tions of the Theorem of Pythagoras that 



▲ Figure 3.2.1 


||v|| 2 = (OR) 2 + (RP) 2 = ( OQ ) 2 + ( QR ) 2 + ( RP ) 2 = v 2 + v 2 + v 2 
and hence that 

1 1 V 1 1 = Vt> 2 + l>2 + V 2 ( 2 ) 

Motivated by the pattern of Formulas (1) and (2), we make the following definition. 


DEFINITION 1 Ifv = (iq, V 2 , ■ ■ ■ , v„) is a vector in R n , then the norm of v (also called 
the length of v or the magnitude of v) is denoted by || v|| , and is defined by the formula 

IM| = V v 2 + v 2 -] l-t; (3) 


3.2 Norm, Dot Product, and Distance in R n 143 


EXAMPLE 1 Calculating Norms 

It follows from Formula (2) that the norm of the vector v = (—3, 2, 1) in R 3 is 
1 1 v 1 1 = V(-3) 2 + 2 2 + l 2 = \/l4 

and it follows from Formula (3) that the norm of the vector v = (2, —1, 3, —5) in R 4 is 
|| v|| = V / 2 2 + (-l) 2 + 3 2 + (-5) 2 = V39 ◄ 

Our first theorem in this section will generalize to R n the following three familiar 
facts about vectors in R 2 and R 3 : 

Distances are nonnegative. 

The zero vector is the only vector of length zero. 

Multiplying a vector by a scalar multiplies its length by the absolute value of that 
scalar. 

It is important to recognize that just because these results hold in R 2 and R 3 does not 
guarantee that they hold in R" — their validity in R" must be proved using algebraic 
properties of n -tuples. 


!EM 3.2.1 Ify is a vector in R n , and if k is any scalar , then'. 
(«) || v|| > 0 

(b) || v|| = 0 if and only if v = 0 

(c) ||*v|| = |*|||v|l 


We will prove part (c) and leave (a) and ( b ) as exercises. 

Proof (c) If v = (vi,V2, , v n ), then kv = (kv i, kv2, ■ ■ ■ , kv n ), so 

llftvll = \/(kv i ) 2 + (kv 2 ) 2 H b (kv ,,) 2 

= (k 2 )(V\ + uf H b v 2 ) 

= Wy/vj+vj + ---+v 2 

= 1*1 IMI 


Unit Vectors 


WARNING Sometimes you will 
see Formula (4) expressed as 
v 

u = 

IMI 


A vector of norm 1 is called a unit vector. Such vectors are useful for specifying a 
direction when length is not relevant to the problem at hand. You can obtain a unit vector 
in a desired direction by choosing any nonzero vector v in that direction and multiplying 
v by the reciprocal of its length. For example, if v is a vector of length 2 in R 2 or R 3 , 
then |v is a unit vector in the same direction as v. More generally, if v is any nonzero 
vector in R n , then 

u = v (4) 

||v|| 


This is just a more compact 
way of writing that formula 
and is not intended to convey 
that v is being divided by ||v|| . 


defines a unit vector that is in the same direction as v. We can confirm that (4) is a unit 
vector by applying part (c) of Theorem 3.2.1 with k = l/||v|| to obtain 


||u|| = HMI = 


= *IMI = —I 


= l 


l 
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The process of multiplying a nonzero vector by the reciprocal of its length to obtain a 
unit vector is called normalizing v. 


► EXAMPLE 2 Normalizing a Vector 

Find the unit vector u that has the same direction as v = (2, 2,-1). 
Solution The vector v has length 

1 1 v 1 1 = v'2 2 + 2 2 + (-l) 2 = 3 

Thus, from (4) 

u={(2, 2,-1) = (§,§,-!) 

As a check, you may want to confirm that ||u|| = 1. A 


The Standard Unit Vectors 

r 

4(0,1) 



(a) 


When a rectangular coordinate system is introduced in R 2 or R 3 , the unit vectors in the 
positive directions of the coordinate axes are called the standard unit vectors. In R 2 these 
vectors are denoted by 

i = (1, 0) and j = (0, 1) 


and in R 3 by 


i= (1,0,0), j = (0,1,0), and k = (0, 0, 1) 


(Figure 3.2.2). Every vector v = Oh, in) in R 2 and every vector v = (ih, i> 2 , ih) in R 3 
can be expressed as a linear combination of standard unit vectors by writing 



v = Oh, v 2 ) = vi(l, 0) + u 2 (0, 1) = i>ii+ u 2 j (5) 

v = (ui, u 2 , 1)3) = V\ (1, 0, 0) + u 2 (0, 1, 0) + t> 3 (0, 0, 1) = iqi + u 2 j + u 3 k (6) 

Moreover, we can generalize these formulas to R" by defining the standard unit vectors 
in R" to be 

d = (1,0,0, ...,0), e 2 = (0, 1,0, . . . , 0), ..., e„ = (0, 0, 0, . . . , 1) (7) 

in which case every vector v = (ih , u 2 , . . . , v n ) in R n can be expressed as 


v = Oh, V 2 , . . . , v„) = i>iei + u 2 e 2 + • • • + v„e„ 


( 8 ) 


► EXAMPLE 3 Linear Combinations of Standard UnitVectors 

(2, -3, 4) = 2i — 3j + 4k 

(7, 3, -4, 5) = 7e, + 3e 2 - 4e 3 + 5e 4 ◄ 


Distance in R r If l\ and IT are points in R 2 or R 3 , then the length of the vector P\ P 2 is equal to the 
distanced between the two points (Figure 3.2.3). Specifically, if Pi (jci , yi) and P 2 (x 2 , y 2 ) 
are points in R 2 , then Formula (4) of Section 3. 1 implies that 


d = HP 1 P 2 II = y/(x 2 - xO 2 + (y 2 - y x ) 2 


(9) 
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▲ Figure 3.2.3 

We noted in the previous 
section that n-tuples can be 
viewed either as vectors or 
points in R n . In Definition 
2 we chose to describe them 
as points, as that seemed the 
more natural interpretation. 


Dot Product 


► Figure 3.2.4 


This is the familiar distance formula from analytic geometry. Similarly, the distance 
between the points P\ (x\ , yi, zi) and yi, z 2 ) in 3-space is 

d( u, v) = || Pi P 2 II = yj (X 2 - xi ) 2 + (y 2 - yi) 2 + (z 2 - Z 1) 2 (10) 

Motivated by Formulas (9) and (10), we make the following definition. 


DEFINITION 2 Ifu = (wi, M 2 , . . . , u n ) and v = (vi, V 2 , . . . , v n ) are points in R'\ then 
we denote the distance between u and v by d (u, v) and define it to be 

d( u, v) = || 11 - v|| = y/ (u 1 - i>i) 2 + (m 2 - v 2 ) 2 H F (u n - v n ) 2 (11) 


► EXAMPLE 4 Calculating Distance in R n 

If 

u= (1,3, -2,7) and v=(0,7,2,2) 
then the distance between u and v is 

d( u, v) = V(1 - 0) 2 + (3 - 7) 2 + (-2 - 2) 2 + (7 - 2) 2 = V 58 ◄ 


Our next objective is to define a useful multiplication operation on vectors in R 2 and R 3 
and then extend that operation to R" . To do this we will first need to define exactly what 
we mean by the “angle” between two vectors in R 2 or R 3 . For this purpose, let u and 
v be nonzero vectors in R 2 or R 3 that have been positioned so that their initial points 
coincide. We define the angle between u and v to be the angle 9 determined by u and v 
that satisfies the inequalities 0 <0 <n (Figure 3.2.4). 



DEFINITION 3 If u and v are nonzero vectors in R 2 or R 3 , and if 9 is the angle between 
u and v, then the dot product (also called the Euclidean inner product ) of u and v is 
denoted by u • v and is defined as 

u • v = || u || || v|| cos 9 (12) 

If u = 0 or v = 0, then we define u • v to be 0. 


The sign of the dot product reveals information about the angle 6 that we can obtain 
by rewriting Formula (12) as 

u • v 

cos 9= (13) 

II U II || v|| 

Since 0 < 9 < it, it follows from Formula (13) and properties of the cosine function 
studied in trigonometry that 


6 is acute if u • v > 0. 


9 is obtuse if u • v < 0. 


9 = tv /2 if u • v = 0. 
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z 


( 0 , 2 , 2 ) 



▲ Figure 3.2.5 


Component Form of the 
Dot Product 

z 


P{ll\, u 2 , u 3 ) 



▲ Figure 3.2.6 


Although we derived Formula 
(15) and its 2-space compan- 
ion under the assumption that 
u and v are nonzero, it turned 
out that these formulas are 
also applicable if u = 0 or 
v = 0 (verify). 


► EXAMPLE 5 Dot Product 

Find the dot product of the vectors shown in Figure 3.2.5. 

Solution The lengths of the vectors are 

||u|| = 1 and ||v|| = V8 = 2\/2 
and the cosine of the angle 9 between them is 

cos(45°) = 1/V2 

Thus, it follows from Formula (12) that 

u • v = ||u|| ||v|| cos 9 = (1)(2V2)(1/V2) = 2 ◄ 


For computational purposes it is desirable to have a formula that expresses the dot 
product of two vectors in terms of components. We will derive such a formula for 
vectors in 3-space; the derivation for vectors in 2-space is similar. 

Let u = (u\, u 2 , m 3 ) and v = (ui, v 2 , V3) be two nonzero vectors. If, as shown in 
Figure 3.2.6, 9 is the angle between u and x, then the law of cosines yields 

IIFQII 2 = || u || 2 + IMI 2 - 2 1| u || || v|| cos 9 (14) 

Since PQ = v — u, we can rewrite (14) as 

|| u|| || v|| cos 0 = i(||u|| 2 + Ill'll 2 — || v — u|| 2 ) 


or 

Substituting 

and 


u • v = \ (|| u || 2 + ||v|| 2 - || v - u|| 2 ) 

|| U II =W 1 +M 2 + W 3 , IMI =V X +V 2 + V 3 
V — u||- = (l>l — Ml) 2 + (l) 2 — U 2 y + (U3 — M3) - 


we obtain, after simplifying, 


u • v = u\V\ + u 2 v 2 + M 3 D 3 (15) 

The companion formula for vectors in 2-space is 

u • v = M1U1 + u 2 v 2 (16) 

Motivated by the pattern in Formulas (15) and (16), we make the following definition. 



Josiah Willard Gibbs 
(1839-1903) 


The dot product notation was first in- 
troduced by the American physicist and mathemati- 
cian J. Willard Gibbs in a pamphlet distributed to his 
students at Yale University in the 1880s. The prod- 
uct was originally written on the baseline, rather than 
centered as today, and was referred to as the direct 
product. Gibbs's pamphlet was eventually incorpo- 
rated into a book entitled Vector Analysis that was pub- 
lished in 1901 and coauthored with one of his students. 
Gibbs made major contributions to the fields of ther- 
modynamics and electromagnetic theory and is gen- 
erally regarded as the greatest American physicist of 
the nineteenth century. 

[Image: SCIENCE SOURCE/Photo Researchers/ 

Getty Images] 
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In words, to calculate the 
dot product (Euclidean inner 
product) multiply correspond- 
ing components and add the 
resulting products. 



A Figure 3.2.7 


Note that the angle 8 obtained 
in Example 7 does not involve 
k. Why was this to be ex- 
pected? 


Algebraic Properties of the 
Dot Product 


DEFINITION 4 Ifu = («j, w 2 , . . . , M„)andv = (i>i, u 2 u„) are vectors in R'\ then 

the dot product (also called the Euclidean inner product) of u and v is denoted by u • v 
and is defined by 

u • v = uiVi + u 2 v 2 H u„v„ (17) 


> EXAMPLE 6 Calculating Dot Products Using Components 

(a) Use Formula (15) to compute the dot product of the vectors u and v in Example 5. 

(b) Calculate u • v for the following vectors in R 4 : 

u= (-1,3, 5, 7), v= (-3, -4, 1,0) 

Solution (a) The component forms of the vectors are u = (0, 0, 1) and v = (0, 2, 2). 
Thus, 

u • v = (0) (0) + (0) (2) + (1)(2) = 2 
which agrees with the result obtained geometrically in Example 5. 

Solution ( b ) 

U • V = ( — 1) ( — 3) + (3)(— 4) + (5)(1) + (7) (0) = -4 


► EXAMPLE 7 A Geometry Problem Solved Using Dot Product 

Find the angle between a diagonal of a cube and one of its edges. 


Solution Let k be the length of an edge and introduce a coordinate system as shown in 
Figure 3.2.7. If we let Ui = ( k , 0, 0), u 2 = (0, k , 0), and 113 = (0, 0, k ), then the vector 

d = ( k , k , A:) = ui + u 2 + 113 


is a diagonal of the cube. It follows from Formula (13) that the angle 0 between d and 
the edge ip satisfies 


COS0 


U| • d 

ui || || d 


k 2 _ 1 

(k)(~JW) ~ V3 


With the help of a calculator we obtain 


0 — cos 



54.74° ◄ 


In the special case where u = v in Definition 4, we obtain the relationship 

v-v=i>j+i> 2 H Iv„ 2 = ||v|| 2 (18) 

This yields the following formula for expressing the length of a vector in terms of a dot 
product: 

||v||=Vw (19) 

Dot products have many of the same algebraic properties as products of real numbers. 


THEOREM 3 . 2.2 If u, v, and w are vectors in R n , 

and ifk is a scalar , then : 

(a) 

u • V = V • u 

[ Symmetry property ] 

(b) 

u-(v + w) = u- v + U'W 

| Distributive property | 

(c) 

k(u • v) = (ku) • V 

[ Homogeneity property | 

(d) 

v • v > 0 and v • v = 0 if and only if v = 0 

| Positivity property ] 


We will prove parts (c) and (d) and leave the other proofs as exercises. 
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Cauchy-Schwarz Inequality 
and Angles in R n 


Proof (c) Let u = (u\, u 2 , . . . , u „ ) and v = (v\,v 2 , ■ ■ . , v n ). Then 
k(u • v) = k(u i Vi + u 2 v 2 + ■ ■ ■ + u„v n ) 

= (kii\)v\ + {ku 2 )V 2 + • ■ ■ + ( ku„)v n = (A:u) • v 

Proof (d) The result follows from parts (a) and ( b ) of Theorem 3.2.1 and the fact that 
v • v = pi pi + p 2 p 2 H h v n v n = Pi + p 2 H f vl = || v|| 2 

The next theorem gives additional properties of dot products. The proofs can be 
obtained either by expressing the vectors in terms of components or by using the algebraic 
properties established in Theorem 3.2.2. 


If u, v, and w are vectors in R " , and if k is a scalar, their. 

(a) 0 • v = v • 0 = 0 

( b ) (u + v) • w = u • w + v • w 

(c) u • (v — w) = u • v — u • w 

( d ) (u — v) • w = u • w — v • w 

( e ) k( u • v) = u • (£v) 


We will show how Theorem 3.2.2 can be used to prove part ( b ) without breaking the 
vectors into components. The other proofs are left as exercises. 

Proof (b) 

(u + v) • W = W • (u + v) |By symmetry] 

= W • U + W • V |By distributivity] 

= U • W + V • W |By symmetry] 

Formulas (18) and (19) together with Theorems 3.2.2 and 3.2.3 make it possible to 
manipulate expressions involving dot products using familiar algebraic techniques. 


^ EXAMPLE 8 Calculating with Dot Products 

(u - 2v) • (3u + 4v) = u • (3u + 4v) - 2v • (3u + 4v) 

= 3(u • u) + 4(u • v) — 6(v • u) — 8(v • v) 

= 3||u|| 2 — 2(u • v) — 8 1| v|| 2 ◄ 

Our next objective is to extend to R" the notion of “angle"’ between nonzero vectors u 
and v. We will do this by starting with the formula 

e = cos -1 f ) (20) 

which we previously derived for nonzero vectors in R 2 and R 3 . Since dot products and 
norms have been defined for vectors in R n , it would seem that this formula has all the 
ingredients to serve as a definition of the angle 0 between two vectors, u and v, in R n . 
However, there is a fly in the ointment, the problem being that the inverse cosine in 
Formula (20) is not defined unless its argument satisfies the inequalities 

u • v 

-1 < < 1 ( 21 ) 

||u|| || v|| 

Fortunately, these inequalities do hold for all nonzero vectors in R" as a result of the 
following fundamental result known as the Cauchy-Schwarz inequality. 
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Geometry 



l|u + v||<||u|| + ||v|| 


▲ Figure 3.2.8 



u 


d( u, v) < d( u, w) + d( w, v) 


▲ Figure 3.2.9 


Cauchy-Schwarz Inequality 

If u = («i, U2, . ■ ■ , u n ) and v = (iq, U2, . . . , v n ) are vectors in R n , then 

|u - v| < ||u|| ||v|| (22) 

or in terms of components 

\U\V\ + U2V2 + • • • + u n v n \ < (m) + m? H h + v 2 + • • • + 

(23) 


We will omit the proof of this theorem because later in the text we will prove a more 
general version of which this will be a special case. Our goal for now will be to use this 
theorem to prove that the inequalities in (21) hold for all nonzero vectors in R" . Once 
that is done we will have established all the results required to use Formula (20) as our 
definition of the angle between nonzero vectors u and v in R n . 

To prove that the inequalities in (21) hold for all nonzero vectors in R n , divide both 
sides of Formula (22) by the product ||u||||v|| to obtain 

— — — < 1 or equivalently 

MINI 

from which (21 ) follows. 

in R r Earlier in this section we extended various concepts to R n with the idea that familiar 
results that we can visualize in R 2 and R 3 might be valid in R" as well. Here are two 
fundamental theorems from plane geometry whose validity extends to R n : 

The sum of the lengths of two side of a triangle is at least as large as the third (Figure 
3.2.8). 

The shortest distance between two points is a straight line (Figure 3.2.9). 

The following theorem generalizes these theorems to R n . 


u • v 


< 1 


THEOREM 3. If u, v, and w are vectors in R' 1 , then: 

(a) 

l|u + v|| < MI + IMI 

[Triangle inequality for vectors] 

(b) 

d(u, v) < d( u, w) + d( w, v) 

[Triangle inequality for distances] 



Hermann Amandus Viktor Yakovlevich 

Schwarz Bunyakovsky 

(1843-1921) (1804-1889) 



The Cauchy-Schwarz in- 
equality is named in honor of the 
French mathematician Augustin Cauchy 
(see p. 121) and the German mathemati- 
cian Hermann Schwarz. Variations of this 
inequality occur in many different settings 
and under various names. Depending on 
the context in which the inequality occurs, 
you may find it called Cauchy's inequal- 
ity, the Schwarz inequality, or sometimes 
even the Bunyakovsky inequality, in recog- 
nition of the Russian mathematician who 
published his version of the inequality in 
1859, about 25 years before Schwarz. 

[Images: ©Rudolph Duehrkoop/ 
ullstein bild/The Image Works (Schwarz): 
http://www-history.mcs.st-and.ac.uk/ 
Biographies/Bu nyakovsky.html 
( Bunyakovsky )] 
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u 


▲ Figure 3.2.10 


Note that Formula (25) ex- 
presses the dot product in 
terms of norms. 


Dot Products as Matrix 
Multiplication 


Proof (a) 

|| u + v || 2 = (u + v) • (u + v) = (u • u) + 2(u • v) + (v • v) 

= || u|| 2 + 2(u • v) + || v|| 2 

< ||u|| 2 + 2|u • v| + || v|| 2 

< ||u|| 2 + 2 1 1 u 1 1 1| v|| + || v|| 2 

= (Hull + ||v||) 2 

This completes the proof since both sides of the inequality in part (a) are nonnegative. 

Proof (b) It follows from part (a) and Formula (1 1 ) that 

d( u, v) = ||u — v|| = || (u — w) + (w — v) || 

< || u — w|| + || w — v|| = d( u, w) + d( w, v) 

It is proved in plane geometry that for any parallelogram the sum of the squares of 
the diagonals is equal to the sum of the squares of the four sides (Figure 3.2.10). The 
following theorem generalizes that result to R n . 


Property of absolute value 
Cauchy-Schwarz inequality 


Parallelogram Equation for Vectors 

If u and v are vectors in R n , then 

|| u + v|| 2 + || u - v|| 2 = 2 (||u|| 2 + ||v|| 2 ) (24) 


Proof 

II u + v|| 2 + || u — v|| 2 = (u + v) • (u + v) + (u — v) • (u — v) 

= 2(u • u) + 2(v • v) 

= 2 (||u|| 2 + || v|| 2 ) 

We could state and prove many more theorems from plane geometry that generalize 
to R n , but the ones already given should suffice to convince you that R n is not so different 
from R 2 and R 3 even though we cannot visualize it directly. The next theorem establishes 
a fundamental relationship between the dot product and norm in R 1 ' . 


!EM 3.2.7 If u and v are vectors in R n with the Euclidean inner product, then 


u 




(25) 


Proof 

|| u + v || 2 = (u + v) • (u + v) = || u || 2 + 2(u • v) + IMI 2 
|| u — v|| 2 = (u — v) • (u — v) = ||u|| 2 — 2(u • v) + || v|| 2 
from which (25) follows by simple algebra. 


There are various ways to express the dot product of vectors using matrix notation. 
The formulas depend on whether the vectors are expressed as row matrices or column 
matrices. Table 1 shows the possibilities. 
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Table 1 


Form 


Dot Product 


Example 


u a column 
matrix and v a 
column matrix 


1 

-3 

5 

'5' 

4 

0 


u r v=[l -3 5] 


v r u = [5 4 0] 


0 

f 

-3 

5 


= -7 


= -7 


u a row matrix 
and v a column 
matrix 


u = [l -3 5] 

'5' 

4 

0 


uv = [1 -3 5] 


v r u r = [5 4 0] 


= -7 


1 

-3 

5 


= -7 


u a column 
matrix and v a 
row matrix 


1 

-3 
5 

v = [5 4 0] 


vu = [5 4 0] 


= -7 


u r v r = [1 -3 5] 


= -7 


u a row matrix 
and v a row 
matrix 


u = [l -3 5] 

v = [5 4 0] 


uv T = [1 -3 5] 


vu T = [5 4 0] 


5 

4 

0 

f 

-3 

5 


= -7 


= -7 


Application of Dot Products to ISBN Numbers 

Although the system has recently changed, most older books have 
been assigned a unique 10-digit number called an International Stan- 
dard Book Number or ISBN. The first nine digits of this number are 
split into three groups — the first group representing the country or 
group of countries in which the book originates, the second iden- 
tifying the publisher, and the third assigned to the book title itself. 
The tenth and final digit, called a check digit, is computed from the 
first nine digits and is used to ensure that an electronic transmission 
of the ISBN, say over the Internet, occurs without error. 

To explain how this is done, regard the first nine digits of the 
ISBN as a vector b in R 9 , and let a be the vector 

a = (1,2, 3,4, 5, 6, 7, 8, 9) 

Then the check digit c is computed using the following procedure: 
Form the dot product a • b. 

Divide a • b by 11, thereby producing a remainder c that is an 
integer between 0 and 10, inclusive. The check digit is taken to 
be c, with the proviso that c = 10 is written as X to avoid double 
digits. 


For example, the ISBN of the brief edition of Calculus, sixth edition, 
by Howard Anton is 

0-471-15307-9 

which has a check digit of 9. This is consistent with the first nine 
digits of the ISBN, since 

a • b = (1,2, 3,4, 5, 6, 7, 8, 9) • (0,4, 7, 1, 1,5, 3, 0, 7) = 152 

Dividing 152 by 11 produces a quotient of 1 3 and a remainder of 
9, so the check digit is c = 9. If an electronic order is placed for a 
book with a certain ISBN, then the warehouse can use the above 
procedure to verify that the check digit is consistent with the first 
nine digits, thereby reducing the possibility of a costly shipping error. 
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If A is an n x n matrix and u and v are nxl matrices, then it follows from the first 
row in Table 1 and properties of the transpose that 

Au • v = v r (Au) = (v r A)u = (A r v) r u = u • A r \ 
u • Av = (Av) r u = (v 7 A 7 ')u = v t (A t u) = A r u • v 

The resulting formulas 

Au • v = u • A t \ (26) 

u • Av = A r u • v (27) 

provide an important link between multiplication by an n x n matrix A and multiplica- 
tion by A T . 


EXAMPLE 9 Verifying that Au • v = u • A T v 

Suppose that 



1 

-2 

3" 


‘- 1 ~ 

A = 

2 

4 

1 

, u = 

2 


-1 

0 

1 


4 


Then 



1-2 3 


-1 


7 

Au = 

2 4 1 


2 

= 

10 


-1 0 1 


4 


5 


1 2 -l" 


~-2 


~-l 

II 

> 

E«h 

-2 4 0 


0 

= 

4 


3 1 1 


5 


-1 


from which we obtain 

Au • v = 7( — 2) + 10(0) + 5(5) = 1 1 
u . A r v= (-l)(-7) + 2(4) + 4(-l) = 11 


Thus, Au • v = u • A T \ as guaranteed by Formula (26). We leave it for you to verify 
that Formula (27) also holds. 


A Dot Product View of Dot products provide another way of thinking about matrix multiplication. Recall that 
Matrix Multiplication if A = [ay] is an m x r matrix and B = \b,j] is an r x n matrix, then the ij th entry of 
AB is 

cnb\j + cipbij T * * * T ciiybfj 
which is the dot product of the i th row vector of A 

I C: 1 £7/2 ‘ * * Cj/ ] 


b \j 

b 2 i 


and the yth column vector of B 
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Thus, if the row vectors of A are rj , r 2 , . . . , r m and the column vectors of B are ci, 
C2, . . . , c„, then the matrix product AB can be expressed as 



r l - ci 

r i • c 2 • 

• r I • C„ 

AB = 

r 2 • ci 

r 2 • c 2 

■ r 2 • c„ 


I’m * ^1 

I’m * ^2 

I’m * C /2 


(28) 


Exercise Set 3.2 


In Exercises 1-2, find the norm of v, and a unit vector that is 
oppositely directed to v. 

1. (a) v = (2, 2, 2) (b) v= (1,0,2, 1,3) 

2. (a) v = (1, -1, 2) (b) v = (-2, 3, 3, -1) 

In Exercises (, evaluate the given expression with 
u = (2, -2, 3), v = (1, -3, 4), and w = (3, 6, -4). 


the positive x-axis, and a vector b in that plane has a length of 
5 units and points in the positive y-direction. Find a • b. 

14. Suppose that a vector a in the xy-plane points in a direction 
that is 47° counterclockwise from the positive x-axis, and a 
vector b in that plane points in a direction that is 43° clock- 
wise from the positive x-axis. What can you say about the 
value of a • b? 


3. (a) ||u + v|| 

(c) || — 2u + 2v|| 

4. (a) ||u + v + w|| 
(c) II 3v|| - 3 1| v || 


(b) ||u|| + || v|| 

(d) || 3u - 5v + w|| 

(b) ||u — v|| 

(d) Hull - ||V|| 


In Exercises 15-16, determine whether the expression makes 
sense mathematically. If not, explain why. 

15. fa) u • (v • w) (b) u • (v + w) 

(c) ||u • v|| (d) (u - v) - ||u|| 


In Exercises , evaluate the given expression with 
u = (-2, -1, 4, 5), v = (3, 1, -5, 7), and w = (-6, 2, 1, 1). 

5. (a) || 3u — 5v + w|| (b) ||3u|| - 5||v|| + ||w|| 

(0 IHIulMl 


16. (a) IMI-IMI (b) (u • v) — w 

(c) (u • v) - k (d) k ■ u 

In Exercises 7-18, verify that the Cauchy-Schwarz inequality 
holds. 


6. (a) ||u|| + II — 2v || + [|-3w|| (b) | ||u - v||w|| 17. (a) u = (-3, 1, 0), v = (2, -1, 3) 

7. Let v = (—2, 3, 0, 6). Find all scalars k such that ||fcv|| = 5. (b) u = (0, 2, 2, 1), v = (1, 1, 1, 1) 


8. Let v = (1, 1, 2, —3, 1). Find all scalars k such that 

llfcvll =4. 

In Exercises TO, find u • v, u • u, and v • v. 

9. (a) u= (3, 1,4), v = (2, 2, —4) 

(b) u= (1, 1.4,6), v = (2. -2, 3. -2) 

10. (a) u = (1, 1.-2, 3), v = (-1.0,5, 1) 

(b) u = (2, -1, 1, 0, -2), v = (1, 2, 2, 2, 1) 

In Exercises 1 -12, find the Euclidean distance between u and v 
and the cosine of the angle between those vectors. State whether 
that angle is acute, obtuse, or 90°. 

11. (a) u= (3,3,3), v = (1,0,4) 

(b) u = (0, -2, -1, 1), v = (-3, 2, 4, 4) 

12. (a) u = (1, 2, -3, 0), v = (5, 1, 2, -2) 

(b) u = (0, 1. 1, 1, 2), v = (2, 1, 0, -1, 3) 


18. (a) u= (4. 1, 1), v = (1,2,3) 

(b) u = (1,2, 1, 2, 3), v = (0, 1, 1, 5, -2) 

19. Letr 0 = (x 0 , yo) be a fixed vector in R 2 . In each part, describe 
in words the set of all vectors r = (x, y) that satisfy the stated 
condition. 

(a) ||r -roll = 1 (b) ||r - r 0 || < 1 (c) ||r - r„|| > 1 

20. Repeat the directions of Exercise 19 for vectors r = (x, y, z) 
and r 0 = (x 0 , y 0 , Zo) in R 3 - 

Exercises 21-25 The direction of a nonzero vector v in an xyz- 
coordinate system is completely determined by the angles a, /3, 
and y between v and the standard unit vectors i, j, and k (Fig- 
ure Ex-21). These are called the direction angles of v, and their 
cosines are called the direction cosines of v. 

21. Use Formula ( 13) to show that the direction cosines of a vector 
v = (vi, v 2 , v 3 ) in R 3 are 


13. Suppose that a vector a in the xy-plane has a length of 9 units 
and points in a direction that is 120° counterclockwise from 


cos cc = 


U 

[Ml ’ 


cos f) = 


V2 

IMI ’ 


v 3 

cosy = — 

IMI 
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◄ Figure Ex-21 


22. Use the result in Exercise 21 to show that 

cos 2 a + cos 2 f) + cos 2 y = 1 

23. Show that two nonzero vectors V) and V 2 in R 3 are orthogonal 
if and only if their direction cosines satisfy 

cos cci cos a 2 + cos cos /S 2 + cos y\ cos y 2 = 0 

24. The accompanying figure shows a cube. 

(a) Find the angle between the vectors d and u to the nearest 
degree. 

(b) Make a conjecture about the angle between the vectors 
d and v, and confirm your conjecture by computing the 
angle. 


30. Under what conditions will the triangle inequality (Theo- 
rem 3.2.5a) be an equality? Explain your answer geometri- 
cally. 

Exercises 31-32 The effect that a force has on an object de- 
pends on the magnitude of the force and the direction in which it is 
applied. Thus, forces can be regarded as vectors and represented 
as arrows in which the length of the arrow specifies the magnitude 
of the force, and the direction of the arrow specifies the direction in 
which the force is applied. It is a fact of physics that force vectors 
obey the parallelogram law in the sense that if two force vectors 
Fi and F 2 are applied at a point on an object, then the effect is 
the same as if the single force Fi + F 2 (called the resultant) were 
applied at that point (see accompanying figure). Forces are com- 
monly measured in units called pounds-force (abbreviated lbf) or 
Newtons (abbreviated N). 




25. Estimate, to the nearest degree, the angles that a diagonal of a 
box with dimensions 10 cm x 15 cm x 25 cm makes with the 
edges of the box. 

26. If ||v|| =2 and || w|| = 3, what are the largest and smallest val- 
ues possible for [|v — w||? Give a geometric explanation of 
your results. 

27. What can you say about two nonzero vectors, u and v, that 
satisfy the equation ||u + v|| = ||u|| + ||v||? 

28. (a) What relationship must hold for the point p = (a,b, c) 

to be equidistant from the origin and the xz-plane? Make 
sure that the relationship you state is valid for positive and 
negative values of a, b, and c. 

(b) What relationship must hold for the point p = {a,b, c) to 
be farther from the origin than from the xz-plane? Make 
sure that the relationship you state is valid for positive and 
negative values of a, b, and c. 

29. State a procedure for finding a vector of a specified length m 
that points in the same direction as a given vector v. 


31. A particle is said to be in static equilibrium if the resultant of 
all forces applied to it is zero. For the forces in the accompa- 
nying figure, find the resultant F that must be applied to the 
indicated point to produce static equilibrium. Describe F by 
giving its magnitude and the angle in degrees that it makes 
with the positive x-axis. 

32. Follow the directions of Exercise 3 1 . 



Working with Proofs 

33. Prove parts (a) and ( b ) of Theorem 3.2.1. 

34. Prove parts (a) and (c) of Theorem 3.2.3. 

35. Prove parts (d) and ( e ) of Theorem 3.2.3. 

True-False Exercises 

TF. In parts (a)-(j) determine whether the statement is true or 
false, and justify your answer. 
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(a) If each component of a vector in R } is doubled, the norm of 
that vector is doubled. 

(b) In R 2 , the vectors of norm 5 whose initial points are at the ori- 
gin have terminal points lying on a circle of radius 5 centered 
at the origin. 

(c) Every vector in R" has a positive norm. 

(d) If v is a nonzero vector in R" , there are exactly two unit vectors 
that are parallel to v. 

(e) If ||u|| = 2, || v|| = 1, and u • v = 1, then the angle between u 
and v is zr/3 radians. 

(f ) The expressions (u • v) + w and u ■ (v + w) are both meaning- 
ful and equal to each other. 

(g) If u • v = u • w, then v = w. 


(h) If u • v = 0, then either u = 0 or v = 0. 

(i) In R 2 , if u lies in the first quadrant and v lies in the third 
quadrant, then u • v cannot be positive. 

(j) For all vectors u, v, and w in R", we have 

l|u + v + w|| < ||u|| + ||v|| + IMI 

Working with Technology 

Tl. Let u be a vector in f? 100 whose ith component is i, and let v 
be the vector in I? 100 whose ith component is 1/(1 + 1). Find the 
dot product of u and v. 

T2. Find, to the nearest degree, the angles that a diagonal of a box 
with dimensions 10 cm x 1 1 cm x 25 cm makes with the edges of 
the box. 


3.3 Orthogonality 

In the last section we defined the notion of “angle” between vectors in R n . In this section 
we will focus on the notion of “perpendicularity.” Perpendicular vectors in R" play an 
important role in a wide variety of applications. 


Orthogonal Vectors Recall from Formula (20) in the previous section that the angle 0 between two nonzero 
vectors u and v in R n is defined by the formula 



It follows from this that 6 = jr/2 if and only if u • v = 0. Thus, we make the following 
definition. 


DEFINITION 1 Two nonzero vectors u and v in R " are said to be orthogonal (or 
perpendicular ) if u • v = 0. We will also agree that the zero vector in R" is orthogonal 
to every vector in R" . 


► EXAMPLE 1 Orthogonal Vectors 

(a) Show that u = (—2, 3, 1,4) and v = (1, 2, 0, —1) are orthogonal vectors in R 4 . 

(b) Let S = {i, j, k} be the set of standard unit vectors in R 3 . Show that each ordered 
pair of vectors in S is orthogonal. 

Solution (a) The vectors are orthogonal since 

u • v = (—2)(1) + (3) (2) + (1)(0) + (4) ( — 1) = 0 
Solution ( b ) It suffices to show that 


i-j = i- k = j- k = 0 
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Using the computations in R 3 
as a model, you should be able 
to see that each ordered pair of 
standard unit vectors in R ” is 
orthogonal. 


Lines and Planes 
Determined by Points and 
Normals 


Formula ( 1 ) is called the point- 
normal form of a line or plane 
and Formulas (2) and (3) the 
component forms. 


because it will follow automatically from the symmetry property of the dot product that 

j • i = k • i = k • j = 0 

Although the orthogonality of the vectors in S is evident geometrically from Figure 3.2.2, 
it is confirmed algebraically by the computations 

i • I = (1. 0, 0) • (0, 1, 0) = 0 

i-k= (1,0,0). (0,0, 1) = 0 
j • k = (0, 1, 0) • (0, 0, 1) = 0 ◄ 


One learns in analytic geometry that a line in R 2 is determined uniquely by its slope and 
one of its points, and that a plane in R 3 is determined uniquely by its “inclination” and 

one of its points. One way of specifying slope and inclination is to use a nonzero vector 

n, called a normal , that is orthogonal to the line or plane in question. For example, 
Figure 3.3.1 shows the line through the point P 0 (xo, >’o) that has normal n = (a, b) and 
the plane through the point Tb(xo, >’o, Zo) that has normal n = (a, b, c). Both the line 
and the plane are represented by the vector equation 

n • ’PqP = 0 (1) 

where P is either an arbitrary point (x, y) on the line or an arbitrary point (x, y, z) in 
the plane. The vector P 0 P can be expressed in terms of components as 
PqP = (x - x 0 , y - y 0 ) [line] 

PqP = (X - X 0 , y - >’(), Z - Zo) [plane] 

Thus, Equation (1) can be written as 

a (pc - x 0 ) + b(y - y 0 ) = 0 | line [ (2) 

a(x x 0 ) T b(y y 0 ) T c(z z 0 ) = 0 [plane| (3) 

These are called the point-normal equations of the line and plane. 


P(x,y) 



/'[,(-% >'o) 


► Figure 3.3.1 



/ 
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► EXAMPLE 2 Point-Normal Equations 

It follows from (2) that in R 2 the equation 

6(x - 3) + (y + 7) = 0 

represents the line through the point (3, —7) with normal n = (6, 1); and it follows from 
(3) that in R 3 the equation 

4(x - 3) + 2 y - 5(z - 7) = 0 

represents the plane through the point (3, 0, 7) with normal n = (4, 2, —5). 
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Referring to Table 1 of Sec- 
tion 3.2, in what other ways 
can you write (6) if n and x are 
expressed in matrix form? 


Orthogonal Projections 


When convenient, the terms in Equations (2) and (3) can be multiplied out and the 
constants combined. This leads to the following theorem. 


THEOREM 3.3.1 

(a) If a and b are constants that are not both zero , then an equation of the form 

ax + by + c = 0 (4) 

represents a line in R 2 with normal n = (a, b). 

(b) If a, b , and c are constants that are not all zero , then an equation of the form 

ax + by + cz + d = 0 (5) 

represents a plane in R 3 with normal n = (a, b. c). 


► EXAMPLE 3 Vectors Orthogonal to Lines and Planes Through the Origin 

(a) The equation ax + by — 0 represents a line through the origin in R 2 . Show that 
the vector n! = (a, b ) formed from the coefficients of the equation is orthogonal to 
the line, that is, orthogonal to every vector along the line. 

(b) The equation ax + by + cz = 0 represents a plane through the origin in R 3 . Show 
that the vector 112 = (a, b, c ) formed from the coefficients of the equation is orthog- 
onal to the plane, that is, orthogonal to every vector that lies in the plane. 

Solution We will solve both problems together. The two equations can be written as 
(a. b ) • (x, y) — 0 and (a, b, c ) • (x, y, z) — 0 


or, alternatively, as 

m • (x, y) = 0 and n 2 • (x, y, z) = 0 

These equations show that ip is orthogonal to every vector (x, y) on the line and that n 2 
is orthogonal to every vector (x, y, z) in the plane (Figure 3.3.1). 


Recall that 

ax + by — 0 and ax + by + cz = 0 

are called homogeneous equations. Example 3 illustrates that homogeneous equations 
in two or three unknowns can be written in the vector form 

11 • x = 0 (6) 

where n is the vector of coefficients and x is the vector of unknowns. In R 2 this is called 
the vector form of a line through the origin, and in R 3 it is called the vector form of a 
plane through the origin. 


In many applications it is necessary to “decompose” a vector u into a sum of two terms, 
one term being a scalar multiple of a specified nonzero vector a and the other term being 
orthogonal to a. For example, if u and a are vectors in R 2 that are positioned so their 
initial points coincide at a point Q, then we can create such a decomposition as follows 
(Figure 3.3.2): 
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Drop a perpendicular from the tip of u to the line through a. 

Construct the vector wi from Q to the foot of the perpendicular. 

Construct the vector w 2 = u — wi. 

Since 

Wi + W2 = Wi + (u — Wi) = u 

we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar 
multiple of a and the second being orthogonal to a. 





▲ Figure 3.3.2 Three possible cases. 


The following theorem shows that the foregoing results, which we illustrated using 
vectors in R 2 , apply as well in R n . 


Projection Theorem 

If u and a are vectors in R' 1 , and if a 0, then u can be expressed in exactly one way 
in the form u = wi + w 2 , where wi is a scalar multiple of a and w 2 is orthogonal to a. 


Proof Since the vector wi is to be a scalar multiple of a, it must have the form 

wi = ka (7) 

Our goal is to find a value of the scalar k and a vector w 2 that is orthogonal to a such 
that 

u = W! + w 2 (8) 

We can determine k by using (7) to rewrite (8) as 

u = wi + w 2 = £a + w 2 

and then applying Theorems 3.2.2 and 3.2.3 to obtain 

u • a = ( ka + w 2 ) • a = £||a|| 2 + (w 2 • a) (9) 

Since w 2 is to be orthogonal to a, the last term in (9) must be 0, and hence k must satisfy 
the equation 

u • a = £||a|| 2 

from which we obtain 


as the only possible value for k. The proof can be completed by rewriting (8) as 

u-a 

w 2 = u — Wi = u — ka = u -a 

\\a\\ z 

and then confirming that w 2 is orthogonal to a by showing that w 2 • a = 0 (we leave the 
details for you). 
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The vectors Wi and W 2 in the Projection Theorem have associated names — the vector 
wi is called the orthogonal projection of u on a or sometimes the vector component of 
u along a, and the vector W 2 is called the vector component of u orthogonal to a. The 
vector wi is commonly denoted by the symbol proj a u, in which case it follows from (8) 
that W 2 = u — proj a u. In summary, 


proj a u 


u • a 

a ( vector component of u along a) 

llall 2 


(10) 


u - proj a u = u - 


u ■ a 



{vector component of u orthogonal to a) 


(ii) 



► EXAMPLE 4 Orthogonal Projection on a Line 

Find the orthogonal projections of the vectors ei = (1,0) and e 2 = (0, 1) on the line L 
that makes an angle 6 with the positive x-axis in R 2 . 

Solution As illustrated in Figure 3.3.3, a = (cos 0, sin 9) is a unit vector along the line 
L, so our first problem is to find the orthogonal projection of ej along a. Since 

||a|| = \/ sin 2 9 + cos 2 6 = 1 and ei • a = (1, 0) • (cos0, sin0) = cos 9 

it follows from Formula (10) that this projection is 

Cj • a 2 

proj a ei = -a = (cos0)(cos0, sin0) = (cos 9, sin 6 cos 9) 

l|a|| 

Similarly, since e 2 • a = (0, 1) • (cos0, sin0) = sin0, it follows from Formula (10) that 

e 2 • a 2 

proj e 2 = -a = (sin0)(cos0, sin0) = (sin 9 cos 9, sin - 0) 

Hall-* 


► EXAMPLE 5 Vector Component of u Along a 

Let u = (2, —1, 3) and a = (4, —1, 2). Find the vector component of u along a and the 
vector component of u orthogonal to a. 

Solution 

u • a = (2) (4) + ( — 1)( — 1) + (3) (2) = 15 
||a|| 2 — 4 2 + (— l) 2 + 2 2 = 21 
Thus the vector component of u along a is 

P roj a u = = jf (4, — 1, 2) = (y, — y) 

and the vector component of u orthogonal to a is 

u- proj a U = (2, -1, 3) - (y , -f, y) = (~f , -y y) 


As a check, you may wish to verify that the vectors u — proj a u and a are perpendicular 
by showing that their dot product is zero. M 
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-||u|| cos 6 


( b ) f < 6 < n 

▲ Figure 3.3.4 

TheTheorem of Pythagoras 



u 


▲ Figure 3.3.5 


OPTIONAL 

Distance Problems 


Sometimes we will be more interested in the norm of the vector component of u 
along a than in the vector component itself. A formula for this norm can be derived as 
follows: 


llproj a u 


u • a 



u • a 




where the second equality follows from part (c) of Theorem 3.2.1 and the third from the 
fact that ||a|| 2 > 0. Thus, 


M ' II l U ' a l 

HproJaUll = -r-n- (12) 

|| a|| 

If 6 denotes the angle between u and a, then u • a = ||u|| ||a|| cost?, so (12) can also be 
written as 

||proj a u|| = 1 1 n | | cos0| (13) 

(Verify.) A geometric interpretation of this result is given in Figure 3.3.4. 


In Section 3.2 we found that many theorems about vectors in R 2 and R 3 also hold in R" . 
Another example of this is the following generalization of the Theorem of Pythagoras 
(Figure 3.3.5). 


Theorem of Pythagoras in R n 

If u and v are orthogonal vectors in R n with the Euclidean inner product, then 

||u + v|| 2 = ||u|| 2 + ||v|| 2 (14) 

Proof Since u and v are orthogonal, we have u • v = 0, from which it follows that 
|| u + v|| 2 = (u + v) • (u + v) = ||u|| 2 + 2(u • v) + ||v|| 2 = ||u|| 2 + || v|| 2 

► EXAMPLE 6 Theorem of Pythagoras in R 4 

We showed in Example 1 that the vectors 

u = (— 2, 3, 1,4) and v= (1,2, 0,-1) 
are orthogonal. Verify the Theorem of Pythagoras for these vectors. 

Solution We leave it for you to confirm that 

u + v = (-1,5, 1,3) 

||u + v|| 2 = 36 
||u|| 2 + || v|| 2 = 30 + 6 

Thus, || u + vj| 2 = ||u|| 2 + || v|| 2 ◄ 

We will now show how orthogonal projections can be used to solve the following three 
distance problems: 

Problem 1. Find the distance between a point and a line in R 2 . 

Problem 2. Find the distance between a point and a plane in R 3 . 

Problem 3. Find the distance between two parallel planes in R \ 
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A method for solving the first two problems is provided by the next theorem. Since the 
proofs of the two parts are similar, we will prove part (b) and leave part (a) as an exercise. 


THEOREM 3.3.4 

( a ) In R~ the distance D between the point Pp(xo, yo) and the line ax + by + c = 0 
is 


(b) 


n _ \ax 0 + by 0 + c\ 

■J a 2 + b 2 

In R ' the distance D between the point Pp(xp, y 0 , zo) and the plane 
ax + by + cz + d = 0 is 


(15) 


I ax 0 + by 0 + czo + d\ 
V a 2 + b 2 + c 2 


(16) 



▲ Figure 3.3.6 


Proof [b) The underlying idea of the proof is illustrated in Figure 3.3.6. As shown in 
that figure, let Q(x i, y\,Zi) be any point in the plane, and let n = (a, b, c) be a normal 
vector to the plane that is positioned with its initial point at Q. It is now evident that the 
distance D between P 0 and the plane is simply the length (or norm) of the orthogonal 
projection of the vector QPp on n, which by Formula (12) is 


D = ||proj n eP 0 || = 


l!2fb-n| 


But 


QPo = (*o - xi , y 0 - yi , Zo - zi) 


Thus 


QPo ■ n = a(x 0 - xi) + b(y 0 - yO + c(z 0 - Zi) 
|| n || = s! d 1 + b 2 + c 2 


D = 


\a(x 0 - xi) + b(y 0 - yi) + c(z 0 - Zi)l 


, (17) 

s/ a 2 + b 2 + c 2 

Since the point Q(x i, yi, zi) lies in the given plane, its coordinates satisfy the equation 
of that plane; thus 

ax i + by i + czi + d = 0 
or 

d = —ax i — byi — cz i 

Substituting this expression in (17) yields (16). 


► EXAMPLE 7 Distance Between a Point and a Plane 

Find the distance D between the point (1, —4, —3) and the plane 2x — 3y + 6z = — T 

Solution Since the distance formulas in Theorem 3.3.4 require that the equations of the 
line and plane be written with zero on the right side, we first need to rewrite the equation 
of the plane as 

2x — 3y + 6z + 1 =0 

from which we obtain 


| 2 ( 1 ) + (— 3 )(— 4 ) + 6 (— 3 ) + 1 | 
y/2 2 + (— 3) 2 + 6 2 


I — 3| 


3 

7 


◄ 


7 
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▲ Figure 3.3.7 The distance 
between the parallel planes V 
and W is equal to the distance 
between Po and W. 


The third distance problem posed above is to find the distance between two parallel 
planes in R 3 . As suggested in Figure 3.3.7, the distance between a plane V and a plane 
W can be obtained by finding any point Po in one of the planes, and computing the 
distance between that point and the other plane. Here is an example. 

► EXAMPLE 8 Distance Between Parallel Planes 

The planes 

x + 2y — 2z = 3 and 2x + 4y — 4z = 7 

are parallel since their normals, (1, 2, —2) and (2, 4, —4), are parallel vectors. Find the 
distance between these planes. 


Solution To find the distance D between the planes, we can select an arbitrary point in 
one of the planes and compute its distance to the other plane. By setting y = z = 0 in 
the equation x + 2y — 2z = 3, we obtain the point Po(3, 0, 0) in this plane. From (16), 
the distance between P 0 and the plane 2x + 4y — 4z = 7 is 

n= |2(3) + 4(0) + ( — 4) (0) — 7| = 1 
y/2 2 + 4 2 + (-4) 2 6 


Exercise Set 3.3 

In Exercises 1-2, determine whether u and v are orthogonal 
vectors. 

1. (a) u= (6, 1,4), v = (2,0, -3) 

(b) u= (0,0, -1), v = (1, 1, 1) 

(c) u = (3, -2, 1, 3), v = (-4, 1, -3, 7) 

(d) u = (5, -4, 0, 3), v = (-4, 1, -3, 7) 

2. (a) u = (2, 3), v = (5, -7) 

(b) u = (1, 1. 1), v = (0,0,0) 

(c) u= (1,-5. 4), v = (3,3,3) 

(d) u = (4, 1, -2,5), v = (-1,5,3, 1) 

In Exercises 3-6, find a point-normal form of the equation of 
the plane passing through P and having n as a normal. 

3. P(- 1, 3, -2); n = (-2, 1, -1) 

4. P(l, 1,4); n = (1,9, 8) 5. P( 2, 0, 0); n = (0, 0, 2) 

6. P(0, 0, 0); n = (1,2,3) 

In Exercises 10, determine whether the given planes are 
parallel. 

7. 4x — y + 2c = 5 and lx — 3y + 4c = 8 

8. x - 4y - 3c - 2 = 0 and 3x - 12y - 9c - 7 = 0 

9. 2 y = 8x — 4c + 5 and x = \z + \y 

10. (-4,1,2). (x,y,z) = 0 and (8, -2, -4) • (jc, y, z) = 0 

In Exercises 1 12, determine whether the given planes are 
perpendicular. 

11. 3x — y + z — 4 = 0, x T 2c = — 1 


12. x — 2y + 3c = 4, — 2x + 5y + 4c = — 1 
In Exercises 13-14, find ||proj a u||. 

13. (a) u= (1,-2), a = (-4,-3) 

(b) u= (3,0,4), a = (2,3,3) 

14. (a) u= (5,6), a = (2,-1) 

(b) u= (3, -2, 6), a = (1,2, -7) 

In Exercises 15-20, find the vector component of u along a and 
the vector component of u orthogonal to a. 

15. u = (6, 2), a = (3, -9) 16. u = (-1, -2), a = (-2, 3) 

17. u= (3, 1,-7), a = (1,0,5) 

18. u = (2,0, 1), a = (1,2, 3) 

19. u = (2, 1, 1,2), a = (4, -4, 2, -2) 

20. u = (5, 0, -3, 7), a = (2, 1, -1, -1) 

In Exercises 21-24, find the distance between the point and the 
line. 

21. (-3, 1); 4x + 3y + 4 = 0 

22. (-1,4); x - 3y + 2 = 0 

23. (2, -5); y = -4x + 2 

24. (1,8); 3x + y = 5 

In Exercises 25-26, find the distance between the point and the 
plane. 

25. (3, 1,-2); x + 2y — 2c = 4 
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26. (-1, -1,2); 2x + 5y-6z = 4 

In Exercises 27-28, find the distance between the given parallel 
planes. 

27. 2x — y — z = 5 and —4x + 2y + 2z = 12 

28. 2x — y + z = 1 and 2x — y + z = — 1 

29. Find a unit vector that is orthogonal to both u = (1,0, 1) and 
v = (0, 1. 1). 

30. (a) Show that v = (a,b) and w = (— b,a ) are orthogonal 

vectors. 

(b) Use the result in part (a) to find two vectors that are or- 
thogonal to v = (2, —3). 

(c) Find two unit vectors that are orthogonal to v = (—3, 4). 

31. Do the points A(l, 1, 1), B(— 2, 0, 3), and C(— 3, —1.1) form 
the vertices of a right triangle? Explain. 

32. Repeat Exercise 31 for the points A(3, 0, 2), B( 4, 3, 0), and 
C(8, 1, -1). 

33. Show that if v is orthogonal to both wi and w 2 , then v is or- 
thogonal to &iWi + fc 2 w 2 for all scalars k\ and k 2 . 

34. Is it possible to have proj a u = proj„a? Explain. 

Exercises 35-37 In physics and engineering the work W per- 
formed by a constant force F applied in the direction of motion to 
an object moving a distance d on a straight line is defined to be 

W = ||F||d (force magnitude times distance) 

In the case where the applied force is constant but makes an angle 
9 with the direction of motion, and where the object moves along 
a line from a point P to a point Q , we call PQ the displacement 
and define the work performed by the force to be 

W = F-PQ= ||F||||P<2|| cos# 

(see accompanying figure). Common units of work are ft-lb (foot 
pounds) or Nm (Newton meters). 



II* LxM * 

^ 0 1 



||F|| cos 8 

K 




Work = (j|F|| cos 8 ) Infill 


35. Show that the work performed by a constant force (not nec- 
essarily in the direction of motion) can be expressed as 

W = ±||Pei|||proj^F|| 

and explain when the + sign should be used and when the — 
sign should be used. 


36. As illustrated in the accompanying figure, a wagon is pulled 
horizontally by exerting a force of 10 lb on the handle at an 
angle of 60° with the horizontal. How much work is done in 
moving the wagon 50 ft? 


F 



50 ft 


37. A sailboat travels 100 m due north while the wind exerts a 
force of 500 N toward the northeast. How much work does 
the wind do? 

Working with Proofs 

38. Let u and v be nonzero vectors in 2- or 3-space, and let k = ||u|| 
and l = || v|| . Prove that the vector w = /u + k\ bisects the 
angle between u and v. 

39. Prove part (a) of Theorem 3.3.4. 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 
false, and justify your answer. 

(a) The vectors (3, —1,2) and (0, 0, 0) are orthogonal. 

(b) If u and v are orthogonal vectors, then for all nonzero scalars 
k and m, ku and m\ are orthogonal vectors. 

(c) The orthogonal projection of u on a is perpendicular to the 
vector component of u orthogonal to a. 

(d) If a and b are orthogonal vectors, then for every nonzero vector 
u, we have 

proj a (proj b (u)) = 0 

(e) If a and u are nonzero vectors, then 

proj a (proj a (u)) = proj a (u) 

(f ) If the relationship 

proj a u = proj a v 

holds for some nonzero vector a, then u = v. 

(g) For all vectors u and v, it is true that 

l|u + v|| = ||u|| + ||v]| 

Working with Technology 

Tl. Find the lengths of the sides and the interior angles of the 
triangle in R 4 whose vertices are 

P( 2, 4, 2,4, 2), g(6, 4. 4, 4. 6), R( 5, 7, 5, 7, 2) 

T2. Express the vector u = (2, 3, 1, 2) in the form u = wi + w 2 , 
where is a scalar multiple of a = (— 1 . 0, 2, 1) and w 2 is orthog- 
onal to a. 
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3.4 The Geometry of Linear Systems 

In this section we will use parametric and vector methods to study general systems of linear 
equations. This work will enable us to interpret solution sets of linear systems with n 
unknowns as geometric objects in R n just as we interpreted solution sets of linear systems 
with two and three unknowns as points, lines, and planes in R 1 and R 3 . 


Vector and Parametric 
Equations of Lines in Ft 2 
and Ft 3 


In the last section we derived equations of lines and planes that are determined by a 
point and a normal vector. However, there are other useful ways of specifying lines and 
planes. For example, a unique line in R 2 or R 3 is determined by a point x 0 on the line and 
a nonzero vector v parallel to the line, and a unique plane in R 3 is determined by a point 
x 0 in the plane and two noncollinear vectors vi and v 2 parallel to the plane. The best way 
to visualize this is to translate the vectors so their initial points are at x 0 (Figure 3.4.1). 




► Figure 3.4.1 



Let us begin by deriving an equation for the line L that contains the point x 0 and is 
parallel to v. If x is a general point on such a line, then, as illustrated in Figure 3.4.2, the 
vector x — x 0 will be some scalar multiple of v, say 

x — x 0 = tv or equivalently x = x 0 + t\ 

As the variable t (called a parameter) varies from — oo to oo, the point x traces out the 
line L. Accordingly, we have the following result. 


Although it is not stated ex- 
plicitly, it is understood in 
Formulas (1) and (2) that the 
parameter t varies from — oo 
to oo. This applies to all vec- 
tor and parametric equations 
in this text except where stated 
otherwise. 


!EM 3.‘ Let L be the line in R 2 or R 3 that contains the point xo and is parallel 
to the nonzero vector v. Then the equation of the line through x 0 that is parallel to v is 

X = x 0 + t\ (1) 

If x o = 0, then the line passes through the origin and the equation has the form 

x — tv (2) 


Vector and Parametric Next we will derive an equation for the plane W that contains the point x 0 and is parallel 
Equations of Planes in R to the noncollinear vectors vi and v 2 . As shown in Figure 3.4.3, if x is any point in the 

plane, then by forming suitable scalar multiples of vi and v 2 , say fiVi and f 2 v 2 , we can 
create a parallelogram with diagonal x — x 0 and adjacent sides t jvi and f 2 v 2 . Thus, we 
have 

x — xo = hvi + f 2 v 2 or equivalently x = xo + bvi + f 2 v 2 

As the variables t\ and t 2 (called parameters) vary independently from — oo to oo, the 
point x varies over the entire plane W. In summary, we have the following result. 
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!EM 3.4.2 Let W be the plane in R 3 that contains the point x 0 and is parallel 
to the noncollinear vectors Vi arid \ 2 . Then an equation of the plane through xo that is 
parallel to Vi and v 2 is given by 

X = X 0 + t jV 1 + t 2 \ 2 (3) 

Ifx 0 = 0, then the plane passes through the origin and the equation has the form 

X = fpq + t 2 \ 2 (4) 


Remark Observe that the line through xo represented by Equation (1) is the translation by xo of 
the line through the origin represented by Equation (2) and that the plane through xo represented 
by Equation (3) is the translation by x 0 of the plane through the origin represented by Equation 
(4) (Figure 3.4.4). 


► Figure 3.4.4 




Motivated by the forms of Formulas (1) to (4), we can extend the notions of line and 
plane to R" by making the following definitions. 


DEFINITION 1 If xo and v are vectors in R", and if v is nonzero, then the equation 

x = x 0 + fv (5) 

defines the line through xo that is parallel to v. In the special case where xo = 0, the 
line is said to pass through the origin. 


DEFINITION 2 If xo, vi, and V 2 are vectors in R n , and if Vi and V 2 are not collinear, 
then the equation 

X = x 0 + fiVi + t 2 y 2 (6) 

defines the plane through xo that is parallel to vi and \ 2 . In the special case where 
x 0 = 0, the plane is said to pass through the origin. 


Equations (5) and (6) are called vector forms of a line and plane in R n . If the vectors 
in these equations are expressed in terms of their components and the corresponding 
components on each side are equated, then the resulting equations are called parametric 
equations of the line and plane. Here are some examples. 

► EXAMPLE 1 Vector and Parametric Equations of Lines in R 2 and R 3 

(a) Find a vector equation and parametric equations of the line in R 2 that passes 
through the origin and is parallel to the vector v = (—2, 3). 

(b) Find a vector equation and parametric equations of the line in R 3 that passes 
through the point P 0 (l, 2, —3) and is parallel to the vector v = (4, —5, 1). 

(c) Use the vector equation obtained in part (b) to find two points on the line that are 
different from P 0 . 
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Solution (a) It follows from (5) with x 0 = 0 that a vector equation of the line is x = t\. 
If we let x = (x, y), then this equation can be expressed in vector form as 

(x, y ) = t{- 2, 3) 

Equating corresponding components on the two sides of this equation yields the para- 
metric equations 

x = —2 1, y — 3t 

Solution (fa) It follows from (5) that a vector equation of the line is x = x 0 + tx. If we 
let x = (x, y, z), and if we take x 0 = (1, 2, —3), then this equation can be expressed in 
vector form as 

(x, y, z) = (1, 2, —3) + t(4, —5, 1) (7) 

Equating corresponding components on the two sides of this equation yields the para- 
metric equations 

x = l+4f, y — 2 — 5t, z — —3 + t 

Solution (c) A point on the line represented by Equation (7) can be obtained by sub- 
stituting a specific numerical value for the parameter t . However, since t — 0 produces 
(x, y, z) = (1, 2, —3), which is the point P 0 , this value of t does not serve our purpose. 
Taking t — 1 produces the point (5, —3, —2) and taking t = — 1 produces the point 
(—3, 7, —4). Any other distinct values for t (except t = 0) would work just as well. 


We would have obtained dif- 
ferent parametric and vector 
equations in Example 2 had we 
solved (8) for y or z rather than 
x. However, one can show the 
same plane results in all three 
cases as the parameters vary 
from —oo to oo. 


► EXAMPLE 2 Vector and Parametric Equations of a Plane in R 3 

Find vector and parametric equations of the plane x — y + 2z = 5. 

Solution We will find the parametric equations first. We can do this by solving the 
equation for any one of the variables in terms of the other two and then using those two 
variables as parameters. For example, solving for x in terms of y and z yields 

x = 5 + y - 2z (8) 

and then using y and z as parameters t\ and f 2 , respectively, yields the parametric equa- 
tions 

x = 5 + t x — 2u, y = t\, z = ti 

To obtain a vector equation of the plane we rewrite these parametric equations as 

(x, y, z) = (5 + t\ - 2t 2 , t\ ,t 2 ) 

or, equivalently, as 

(x, y, z) = (5, 0, 0) + fKl, 1, 0) + f 2 (- 2, 0, 1) 


► EXAMPLE 3 Vector and Parametric Equations of Lines and Planes in R 4 

(a) Find vector and parametric equations of the line through the origin of R 4 that is 
parallel to the vector v = (5, —3, 6, 1). 

(b) Find vector and parametric equations of the plane in R 4 that passes through the point 
x 0 = (2, —1, 0, 3) and is parallel to both vi = (1, 5, 2, —4) and v 2 = (0, 7, —8, 6). 

Solution (a) If we let x = (xi , x 2 , X 3 , X 4 ), then the vector equation x = tx can be ex- 
pressed as 

(x 1; x 2 , x 3 , X 4 ) = t( 5, —3, 6 , 1) 

Equating corresponding components yields the parametric equations 


x\ = 5 1, x 2 = —3 1, x 3 = 6 1, Xi — t 
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Lines Through Two Points 
in R n 



▲ Figure 3.4.5 



▲ Figure 3.4.6 


Solution (b) The vector equation x = x 0 + t\\\ + f 2 v 2 can be expressed as 
( jci , x 2 , X3, X4) = (2, — 1, 0, 3) + ^(1, 5, 2, —4) + t 2 ( 0, 7, —8, 6) 
which yields the parametric equations 

x\ = 2 + t\ 
x 2 = — 1 + 5t\ + lt 2 
x 2 = 2 1\ — 8 1 2 
X4 = 3 — 4ti + 6t 2 


If x 0 and xi are distinct points in R " , then the line determined by these points is parallel to 
the vector v = xi — x 0 (Figure 3.4.5), so it follows from (5) that the line can be expressed 
in vector form as 

X = x 0 + f(xi - x 0 ) (9) 


or, equivalently, as 


x = (1 - r)x 0 + tXx 


(10) 


These are called the two-point vector equations of a line in R" . 


► EXAMPLE 4 A Line Through Two Points in R 2 

Find vector and parametric equations for the line in R 2 that passes through the points 
P(0, 7) and <2(5,0). 

Solution We will see below that it does not matter which point we take to be x 0 and 
which we take to be xi, so let us choose x 0 = (0, 7) and xi = (5, 0). It follows that 
xi — x 0 = (5, —7) and hence that 

(x, y) = (0, 7) + f(5, -7) (11) 

which we can rewrite in parametric form as 

x = 5 1, y = 7 — It 

Had we reversed our choices and taken x 0 = (5, 0) and xj = (0, 7), then the resulting 
vector equation would have been 

(x, y) = (5, 0) + f(-5, 7) (12) 

and the parametric equations would have been 

x = 5 — 5t, y — It 

(verify). Although (11) and (12) look different, they both represent the line whose equa- 
tion in rectangular coordinates is 

lx + 5y = 35 

(Figure 3.4.6). This can be seen by eliminating the parameter t from the parametric 
equations (verify). 

The point x = (x, y) in Equations (9) and (10) traces an entire line in R 2 as the 
parameter t varies over the interval (—00, 00). If, however, we restrict the parameter to 
vary from t = 0 to t — 1, then x will not trace the entire line but rather just the line 
segment joining the points x 0 and xi . The point x will start at x 0 when t — 0 and end at 
xi when t = 1 . Accordingly, we make the following definition. 
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Dot Product Form of a 
Linear System 


DEFINITION 3 If x 0 and X! are vectors in R n , then the equation 

X = x 0 + f(x! - Xo) (0 < t < 1) (13) 

defines the line segment from x 0 to xj . When convenient. Equation (13) can be written 
as 

x = (1 - f)xo + fxi (0 < t < 1) (14) 


► EXAMPLE 5 A Line Segment from One Point to Another in Ft 2 

It follows from (13) and (14) that the line segment in R 2 from x 0 = (1, — 3)toxi = (5, 6) 
can be represented either by the equation 

x = (1, -3) + t(4, 9) (0 < f < 1) 

or by the equation 

x = (1 - r)(l, -3) + t(5, 6) (0 < t < 1) ◄ 

Our next objective is to show how to express linear equations and linear systems in dot 
product notation. This will lead us to some important results about orthogonality and 
linear systems. 

Recall that a linear equation in the variables X\,X 2 , ■ ■ ■ , x„ has the form 

a\X\ + 02 X 2 + • • • + a„x n = b (cq, 02 , . . . , a„ not all zero) (15) 

and that the corresponding homogeneous equation is 

a\X\ + 02 X 2 + ■ ■ ■ + a n x„ =0 («i, 02 , ■ ■ . , a„ not all zero) (16) 

These equations can be rewritten in vector form by letting 

a = (oi, 02 , . . ■ , a n ) and x = (x\, X 2 , , x n ) 
in which case Formula (15) can be written as 


and Formula (16) as 


a • x = b 

a • x = 0 


(17) 

(18) 


Except for a notational change from n to a, Formula (18) is the extension to R" ofFormula 
(6) in Section 3.3. This equation reveals that each solution vector x of a homogeneous 
equation is orthogonal to the coefficient vector a. To take this geometric observation a 
step further, consider the homogeneous system 

n\\X\ + 012X2 + • • • + a \ n x n = 0 

02 \X\ + 022X2 + • • • + 02 nX n = 0 


o m i x [ 4~ o m 2 X 2 T ■ * * T o mn x n — 0 


If we denote the successive row vectors of the coefficient matrix by ri, r 2 , . . . , r m , then 
we can rewrite this system in dot product form as 


n . x = 0 
r 2 • x = 0 


(19) 


r,n • X = 0 

from which we see that every solution vector x is orthogonal to every row vector of the 
coefficient matrix. In summary, we have the following result. 
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1.4.3 If A is an m x n matrix , then the solution set of the homogeneous 
linear system Ax = 0 consists of all vectors in R n that are orthogonal to every row 
vector of A. 


► EXAMPLE 6 Orthogonality of Row Vectors and Solution Vectors 

We showed in Example 6 of Section 1 .2 that the general solution of the homogeneous 
linear system 



"*f 



"1 3 -2 0 2 0" 


Xl 


"0" 

2 6 -5 -2 4 -3 


x 3 


0 

0 0 5 10 0 15 


X 4 


0 

_2 6 0 8 4 18_ 


x 5 


_0_ 


x b 




is 

xi = —3 r — As — 2 1 , X 2 = r, x 3 = —2s, x\ — s, x b = t, x b — 0 
which we can rewrite in vector form as 

x = (— 3r — 4s — 2 1 , r, —2s, s, t, 0) 

According to Theorem 3.4.3, the vector x must be orthogonal to each of the row vectors 

r, = (1,3, -2,0, 2,0) 
r 2 = (2, 6, -5, -2,4, -3) 
r 3 = (0,0, 5, 10,0, 15) 
n = (2, 6, 0,8,4, 18) 

We will confirm that x is orthogonal to ri , and leave it for you to verify that x is orthogonal 
to the other three row vectors as well. The dot product of ri and x is 

r, • x = 1(— 3r - 4s - It) + 3(r) + (-2)(-2s) + 0(s) + 2(f) + 0(0) = 0 

which establishes the orthogonality. 


The Relationship Between We will conclude this section by exploring the relationship between the solutions of 

Ax = 0 and Ax = b a homogeneous linear system Ax = 0 and the solutions (if any) of a nonhomogeneous 

linear system Ax = b that has the same coefficient matrix. These are called corresponding 
linear systems. 

To motivate the result we are seeking, let us compare the solutions of the correspond- 
ing linear systems 
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"1 
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■ 0" 

6-5-2 4 -3 


X3 


-1 

0 5 10 0 15 


X 4 


5 

OO 

OO 

0 

so 


x 5 
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We showed in Examples 5 and 6 of Section 1.2 that the general solutions of these linear 
systems can be written in parametric form as 


homogeneous — 
nonhomogeneous 


x\ = —3 r — 4s — 2 t, Xf = r, xt, = —2s, X 4 = s, xs = t, x^ = 0 
— ► xi = —3 r — 4s — 2 1 , X 2 = r, X 3 = —2s, X 4 = s, X 5 = t, x^ = t 
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which we can then rewrite in vector form as 

homogeneous > , Xi, X3, X 4 , X 5 , X 6 ) = (—3 r — 4s — 2 t , r, —2 S, S, t, 0) 

no ii — ¥■ (xi, xi, X3, X 4 , X 5 , xf) = ( — 3r — 4? — 2t, r, —2s, s, t, |) 

By splitting the vectors on the right apart and collecting terms with like parameters, we 
can rewrite these equations as 

homogeneous * (jCi , Xi, X3, X 4 , X5) = r( — 3, 1, 0, 0, 0) + s( — 4, 0, — 2, 1, 0, 0) + f( — 2, 0, 0, 0, 1, 0) (20) 

nonhomogeneous > (jti, Xi ■ X 3 , X 4 , X5) = r( — 3, 1, 0, 0, 0) + s{ — 4, 0, — 2, 1, 0, 0) 

+ f(— 2, 0, 0, 0, 1, 0) + (0, 0, 0, 0, 0, i) (21) 

Formulas (20) and (21) reveal that each solution of the nonhomogeneous system can be 
obtained by adding the fixed vector (0, 0, 0, 0, 0, j) to the corresponding solution of the 
homogeneous system. This is a special case of the following general result. 


!EM 3.4.4 The general solution of a consistent linear system Ax = b can he 
obtained by adding any specific solution of Ax = b to the general solution of Ax = 0. 


Proof Let x 0 be any specific solution of Ax = b, let W denote the solution set of Ax = 0, 
and let x 0 + W denote the set of all vectors that result by adding x 0 to each vector in 
W. We must show that if x is a vector in xo + W, then x is a solution of Ax = b. and 
conversely that every solution of Ax = b is in the set xo + W. 

Assume first that x is a vector in xo + W. This implies that x is expressible in the 
form x = x 0 + w, where Ax 0 = b and Aw = 0. Thus, 

Ax = A(x 0 + w) = Ax 0 + Aw = b + 0 = b 


which shows that x is a solution of Ax = b. 

Conversely, let x be any solution of Ax = b. To show that x is in the set xo + W we 
must show that x is expressible in the form 



A Figure 3.4.7 The solution set 
of Ax = b is a translation of the 
solution space of Ax = 0. 


x = x 0 + w (22) 

where w is in W (i.e., Aw = 0). We can do this by taking w = x — x 0 . This vector obvi- 
ously satisfies (22), and it is in W since 

Aw = A(x — Xo) = Ax — Ax 0 = b — b = 0 

Remark Theorem 3.4.4 has a useful geometric interpretation that is illustrated in Figure 3.4.7. 
If, as discussed in Section 3.1, we interpret vector addition as translation, then the theorem states 
that if x 0 is any specific solution of Ax = b, then the entire solution set of Ax = b can be obtained 
by translating the solution space of Ax = 0 by the vector xq. 


Exercise Set 3.4 

In Exercises 1-4, find vector and parametric equations of the 
line containing the point and parallel to the vector. 

1. Point: (—4, 1); vector: v = (0, —8) 

2. Point: (2, —1); vector: v = (—4, —2) 

3. Point: (0, 0, 0); vector: v = (—3, 0, 1) 

4. Point: (—9, 3, 4); vector: v = (—1, 6, 0) 


In Exercises , use the given equation of a line to find a point 
on the line and a vector parallel to the line. 

5. x = (3 — 5 1 , —6 — t) 

6. (x, y, z) = (4 1 , 7. 4 + 3 1 ) 

7. x = (1 - 0(4, 6) + t(- 2, 0) 

8. x= (1 — 0(0. -5, 1) 
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In Exercises 9-12, find vector and parametric equations of 
the plane that contains the given point and is parallel to the two 
vectors. 

9. Point: (—3, 1, 0); vectors: Vi = (0, —3, 6 ) and 
v 2 = (-5,1,2) 

10. Point: (0, 6 , —2); vectors: v, = (0, 9, —1) and 
v 2 = (0, -3, 0) 

11. Point: (—1. 1,4); vectors: v, = (6,-1, 0) and 
V 2 = (— 1,3, 1) 

12. Point: (0, 5, —4); vectors: Vi = (0, 0, —5) and 
v 2 = (1, -3, -2) 

In Exercises 14, find vector and parametric equations of 
the line in R 2 that passes through the origin and is orthogonal 
to v. 

13. v = (—2, 3) 14. v= (1,-4) 

In Exercises -16, find vector and parametric equations of 
the plane in R 3 that passes through the origin and is orthogonal 
to v. 

15. v = (4, 0, —5) [Hint: Construct two nonparallel vectors or- 
thogonal to v in /? 3 ]. 

16. v = (3, 1, - 6 ) 

In Exercises 20, find the general solution to the linear sys- 
tem and confirm that the row vectors of the coefficient matrix are 
orthogonal to the solution vectors. 

17. X\ -fi x 2 4“ X 3 = 0 18. X\ 4“ 3x 2 — 4 x 3 = 9 

'lx i 4~ 2x 2 4~ 2x 3 — 0 lx i 4~ 6 x 2 — 8 x 3 0 

3x ] 4“ 3x 2 4“ 3 x 3 = 0 

19. x ] 4- 5x 2 4~ X 3 4~ 2 .X 4 — X 5 = 0 
X\ — 2x 2 — X 3 4~ 3 x 4 4~ 2 x 5 — 0 

20. xi 4- 3 x 2 — 4x 3 = 0 
X\ 4“ 2x 2 4“ 3x3 = 9 

21. (a) The equation x 4- y + z — 1 can be viewed as a linear sys- 

tem of one equation in three unknowns. Express a general 
solution of this equation as a particular solution plus a 
general solution of the associated homogeneous equation. 

(b) Give a geometric interpretation of the result in part (a). 

22. (a) The equation x 4- y = 1 can be viewed as a linear system 

of one equation in two unknowns. Express a general solu- 
tion of this equation as a particular solution plus a general 
solution of the associated homogeneous system. 

(b) Give a geometric interpretation of the result in part (a). 

23. (a) Find a homogeneous linear system of two equations in 

three unknowns whose solution space consists of those 
vectors in R } that are orthogonal to a = ( 1 , 1 . 1 ) and 
b = (-2,3,0). 

(b) What kind of geometric object is the solution space? 


(c) Find a general solution of the system obtained in part (a), 
and confirm that Theorem 3.4.3 holds. 

24. (a) Find a homogeneous linear system of two equations in 

three unknowns whose solution space consists of those 
vectors in R 3 that are orthogonal to a = (—3, 2, — 1) and 
b = (0, -2, -2). 

(b) What kind of geometric object is the solution space? 

(c) Find a general solution of the system obtained in part (a), 
and confirm that Theorem 3.4.3 holds. 

25. Consider the linear systems 


'3 2 -r 


"xf 


"O' 
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x 2 
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_X 3 _ 


_9_ 
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xf 


' 2" 
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-* 2 
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_— 3 -2 1_ 


_*3_ 


-2_ 


(a) Find a general solution of the homogeneous system. 

(b) Confirm that xi = 1, x 2 = 0, X 3 = 1 is a solution of the 
nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution 
of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomoge- 
neous system directly. 

26. Consider the linear systems 
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'xf 
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(a) Find a general solution of the homogeneous system. 

(b) Confirm that xi = 1 , x 2 = 1 , x 3 = 1 is a solution of the 
nonhomogeneous system. 

(c) Use the results in parts (a) and (b) to find a general solution 
of the nonhomogeneous system. 

(d) Check your result in part (c) by solving the nonhomoge- 
neous system directly. 

In Exercises 27-28, find a general solution of the system, and 
use that solution to find a general solution of the associated homo- 
geneous system and a particular solution of the given system. 
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'9 
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29. Let x = x 0 + t \ be a line in R n , and let T: R" -> R" be a ma- 
trix operator on R" . What kind of geometric object is the 
image of this line under the operator T? Explain your reason- 
ing. 

True-False Exercises 

TF. In parts (a)— (f ) determine whether the statement is true or 

false, and justify your answer. 

(a) The vector equation of a line can be determined from any point 
lying on the line and a nonzero vector parallel to the line. 

(b) The vector equation of a plane can be determined from any 
point lying in the plane and a nonzero vector parallel to the 
plane. 

(c) The points lying on a line through the origin in R 2 or R 3 are 
all scalar multiples of any nonzero vector on the line. 

(d) All solution vectors of the linear system Ax = b are orthogo- 
nal to the row vectors of the matrix A if and only if b = 0. 


(e) The general solution of the nonhomogeneous linear system 
Ax = b can be obtained by adding b to the general solution 
of the homogeneous linear system Ax = 0. 

(f ) If Xi and x 2 are two solutions of the nonhomogeneous linear 
system Ax = b, then X! — x 2 is a solution of the corresponding 
homogeneous linear system. 

Working with Technology 

Tl. Find the general solution of the homogeneous linear system 



Xi 
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x 2 
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X3 
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*6 



and confirm that each solution vector is orthogonal to every row 
vector of the coefficient matrix in accordance with Theorem 3.4.3. 


3.5 Cross Product 

This optional section is concerned with properties of vectors in 3-space that are important 
to physicists and engineers. It can be omitted, if desired, since subsequent sections do not 
depend on its content. Among other things, we define an operation that provides a way of 
constructing a vector in 3-space that is perpendicular to two given vectors, and we give a 
geometric interpretation of 3 x 3 determinants. 

Cross Product of Vectors I n Section 3.2 we defined the dot product of two vectors u and v in w-space. That operation 

produced a scalar as its result. We will now define a type of vector multiplication that 
produces a vector as the result but which is applicable only to vectors in 3-space. 


DEFINITION 1 If u = (uu u 2 , M3) and 

V = 

(iq, v 2 , v 3 ) are vectors in 3-space, then 

the cross product u x v is the vector defined by 




U X V = (u 2 V 3 — U 3 V 2 , u 3,Vl — UlV 3, U\V 2 — M 2 lfi) 

or, in determinant notation, 

/ 

U ? Ut, 


U\ U~K 


U\ U2 

) 

U X V = 1 

v 2 v 3 

’ 

1>1 v 3 

’ 

Vi v 2 


Remark Instead of memorizing ( 1 ), you can obtain the components of u x v as follows: 


Form the 2x3 matrix U 1 I>2 11 3 whose first row contains the components of u and whose 
Ltd v 2 v 3 \ 


«2 “3 

lb V 3 

second row contains the components of v. 
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To find the first component of u x v, delete the first column and take the determinant; to find 
the second component, delete the second column and take the negative of the determinant; and 
to find the third component, delete the third column and take the determinant. 


► EXAMPLE 1 Calculating a Cross Product 

Find u x v, where u = (1,2, —2) and v = (3, 0, 1). 


Solution From either (1) or the mnemonic in the preceding remark, we have 


U X V : 


= (2, -7, -6) ◄ 


The following theorem gives some important relationships between the dot product 
and cross product and also shows that u x v is orthogonal to both u and v. 


The formulas for the vector 
triple products in parts ( d ) 
and (e) of Theorem 3.5.1 are 
useful because they allow us 
to use dot products and scalar 
multiplications to perform cal- 
culations that would other- 
wise require determinants to 
calculate the required cross 
products. 


Relationships Involving Cross Product and Dot Product 

If u, v, and w are vectors in 3-space, then 

(a) U • (u X v) = 0 [ u X v is orthogonal to u | 

( b ) V • (u X v) = 0 [u X v is orthogonal toy] 

(c) || U X v||“ = ||u|| 2 ||v||“ — (u • v)“ [Lagrange’s identity] 

(d ) U X (v X W) = (u • w)v — (u • v)w | vector triple product] 

(e) (u X V) X W = (u • w)v — (v • w)u [ vector triple product] 


Proof [a] Letu = {u\, u 2 , m 3 ) and v = (iq, id, id). Then 

U • (u X v) = (ill, U 2 , M3) • (u 2 V 2 — U.T,V 2 , U 3l>i — Mild, Mild — U 2 v i) 

= ui{u 2 Vj — M3U2) "h u 2 (u 2 Vi — Mild) -}- u 2 (uiV 2 — M2U1) = 0 

Proof (b) Similar to (a). 

Proof (c) Since 

|| u x v||~ = ( u 2 v 2 — idid)- 4“ (M 3 U 1 — Miid)“ -f- (idid — M 2 U 1) 2 ( 2 ) 

and 

||u|| 2 ||v || 2 - (u • v ) 2 = (m 2 + M 2 + m 2 )(i> 2 + id + uf ) - (m ltd + u 2 v 2 + U 3 V 3) 2 (3) 

the proof can be completed by “multiplying out” the right sides of (2) and (3) and 
verifying their equality. 

Proof (d) and (e) See Exercises 40 and 41 . 


The cross product notation A x B was introduced by the American physicist and 
mathematician J. Willard Gibbs, (see p. 146) in a series of unpublished lecture notes for his students 
atYale University. It appeared in a published work for the first time in the second edition of the book 
Vector Analysis, by Edwin Wilson (1879-1964), a student of Gibbs. Gibbs originally referred to Ay. B 
as the "skew product." 
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EXAMPLE 2 u x v Is Perpendicular to u and to v 

Consider the vectors 

u= (1,2, -2) and v = (3,0, 1) 
In Example 1 we showed that 

u x v = (2, —7, —6) 


Since 


U • (u x v) = (1 ) (2) + (2)(— 7) + (—2) (—6) = 0 


and 


v • (u x v) = (3) (2) + (0)(— 7) + ( 1 ) ( — 6) = 0 


u x v is orthogonal to both u and v, as guaranteed by Theorem 3.5.1. M 


The main arithmetic properties of the cross product are listed in the next theorem. 


Properties of Cross Product 

If u, v, and w are any vectors in 3-space and k is any scalar, their. 

(a) u x v = — (v x u) 

(b) u x (v + w) = (u x v) + (u x w) 

(c) (u + v) x w = (u x w) + (v x w) 

( d ) k{ u x v) = ( ku ) x v = u x (A:v) 

(e) ux0 = 0xu=0 
(/) u x u = 0 


The proofs follow immediately from Formula (1) and properties of determinants; for 
example, part (a) can be proved as follows. 

Proof (a) Interchanging u and v in (1) interchanges the rows of the three determinants 
on the right side of (1) and hence changes the sign of each component in the cross pro- 
duct. Thus u x v = — (v x u). 

The proofs of the remaining parts are left as exercises. 



Joseph Louis Lagrange 
(1736-1813) 


Joseph Louis Lagrange was a French-ltalian mathematician and astronomer. Although his 
father wanted him to become a lawyer, Lagrange was attracted to mathematics and astronomy after reading 
a memoir by the astronomer Halley. At age 16 he began to study mathematics on his own and by age 19 
was appointed to a professorship at the Royal Artillery School in Turin. The following year he solved some 
famous problems using new methods that eventually blossomed into a branch of mathematics called the 
calculus of variations. These methods and Lagrange's applications of them to problems in celestial mechanics 
were so monumental that by age 25 he was regarded by many of his contemporaries as the greatest living 
mathematician. One of Lagrange's most famous works is a memoir, Mecanique Analytique, in which he 
reduced the theory of mechanics to a few general formulas from which all other necessary equations could 
be derived. Napoleon was a great admirer of Lagrange and showered him with many honors. In spite of his 
fame, Lagrange was a shy and modest man. On his death, he was buried with honor in the Pantheon. 

[Image: © travelerm6/iStockphoto] 
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▲ Figure 3.5.1 The standard 
unit vectors. 



▲ Figure 3.5.2 


Determinant Form of Cross 
Product 


► EXAMPLE 3 Cross Products of the Standard Unit Vectors 

Recall from Section 3.2 that the standard unit vectors in 3-space are 

i= (1,0,0), j= (0,1,0), k= (0,0,1) 

These vectors each have length 1 and lie along the coordinate axes (Figure 3.5.1). Every 
vector v = (ui , U 2 , u 3 ) in 3-space is expressible in terms of i, j, and k since we can write 


v = (m, v 2 , u 3 ) = i>i(l, 0, 0) + u 2 (0, 1, 0) + u 3 (0, 0, 1) = r>ii + u 2 j + u 3 k 


For example, 


From (1) we obtain 


(2, -3, 4) = 2i - 3j + 4k 


i x j = 



0 

0 


1 0 
0 0 



= (0, 0, 1) = k ◄ 


You should have no trouble obtaining the following results: 

ixi=0 j x j = 0 kxk=0 

i x j = k j x k = i kxi = j 

j x i = — k k x j = — i i x k = — j 

Figure 3.5.2 is helpful for remembering these results. Referring to this diagram, the cross 
product of two consecutive vectors going clockwise is the next vector around, and the 
cross product of two consecutive vectors going counterclockwise is the negative of the 
next vector around. 


It is also worth noting that a cross product can be represented symbolically in the form 


i j k 






U 2 

U 3 


U\ 

U 3 


U\ 

u 2 

U\ 

«2 

U 3 

— 

V 2 

V 3 

1 

V\ 

V 3 

j + 

V\ 

V 2 

V\ 

l>2 

V 3 











(4) 


For example, if u = (1, 2, —2) and v = (3, 0, 1), then 


U X V = 


1 

3 


j k 

2 -2 
0 1 


= 2i - 7j - 6k 


which agrees with the result obtained in Example 1 . 


WARNING It is not true in general that u x (v x w) = (u x v) x w. For example, 

ix(jxj) = ix0 = 0 


and 


(i x j) x j = k x j = — i 


so 

i x ( j x j) ^ (i x j) x j 
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u X V 



A Figure 3.5.3 


We know from Theorem 3.5.1 that u x v is orthogonal to both u and v. If u and 
v are nonzero vectors, it can be shown that the direction of u x v can be determined 
using the following “right-hand rule” (Figure 3.5.3): Let 9 be the angle between u and 
v, and suppose u is rotated through the angle 9 until it coincides with v. If the fingers of 
the right hand are cupped so that they point in the direction of rotation, then the thumb 
indicates (roughly) the direction of u x v. 

You may find it instructive to practice this rule with the products 

i x j = k, j x k = i, kxi = j 


Geometric Interpretation of If u and v are vectors in 3-space, then the norm of u x v has a useful geometric interpre- 
Cross Product tation. Lagrange’s identity, given in Theorem 3.5.1, states that 

|| u x v || 2 = ||u|| 2 ||v|| 2 — (u • v) 2 (5) 

If 9 denotes the angle between u and v, then u • v = ||u|| ||v|| cos 9, so (5) can be rewritten 
as 

|| u x v|| 2 = ||u|| 2 1| v|| 2 — ||u|| 2 ||v|| 2 cos 2 9 
= ||u|| 2 ||v|| 2 (l - cos 2 9) 

= ||u|| 2 ||v|| 2 sin 2 9 

Since 0 < 0 < n, it follows that sin 9 > 0, so this can be rewritten as 



|| u x v|| = || u | || v|| sin0 (6) 

But ||v|| sinO is the altitude of the parallelogram determined by u and v (Figure 3.5.4). 
Thus, from (6), the area A of this parallelogram is given by 

A = (base)(altitude) = ||u||||v|| sin# = ||u x v|| 

This result is even correct if u and v are collinear, since the parallelogram determined by 
u and v has zero area and from (6) we have u x v = 0 because 9 = 0 in this case. Thus 
we have the following theorem. 


Area of a Parallelogram 

If u and v are vectors in 3-space, then ||u x v|| is equal to the area of the parallelogram 
determined by u and v. 



► EXAMPLE 4 Area of a Triangle 

Find the area of the triangle determined by the points P\{2, 2, 0), P 2 {—\, 0, 2), and 
^(0,4,3). 

Solutioi The area A of the triangle is \ the area of the parallelogram determined by 
>- 

the vectors Pi P 2 and P\ Pj (Figure 3.5.5). Using the method discussed in Example 1 of 
Section 3.1, P\P 2 = (—3, —2, 2) and P\P 2 = (—2, 2, 3). It follows that 

PvP 2 x Kp 3 = (-10, 5,-10) 


P i(2, 2, 0) 


(verify) and consequently that 
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DEFINITION 2 If u, v, and w are vectors in 3-space, then 

u • (v x w) 

is called the scalar triple product of u, v, and w. 


The scalar triple product ofu = (u\, u 2 , M 3 ), v = (ui, lb, v 3 ), and w = (u>i, w 2 , w 3 ) 
can be calculated from the formula 



U\ 

u 2 

U3 

u • (v X w) = 

Vi 

V 2 

V3 


W 1 

w 2 

W 3 


This follows from Formula (4) since 


u • (v x w) = u • 


ib 

lib 



Vi 

m 


v 3 

W 3 



V 2 

W 2 



Vi 

V 3 

u 2 + 

Vl 

V 2 

U\ — 

W\ 

W 3 

m 

w 2 



U\ U 2 U 3 

V\ v 2 v 3 

W\ W 2 W3 


(7) 


► EXAMPLE 5 Calculating a ScalarTriple Product 

Calculate the scalar triple product u • (v x w) of the vectors 

u = 3i — 2j — 5k, v = i + 4j — 4k, w = 3j + 2k 
Solution From (7), 


u • (v X w) = 


3 -2 -5 

1 4 -4 

0 3 2 


= 3 


4 -4 
3 2 

= 60 + 4 - 


-(- 2 ) 


1 -4 

0 2 

15 = 49 ◄ 


(-5) 


4 

3 


Remark The symbol (u • v) x w makes no sense because we cannot form the cross product of 
a scalar and a vector. Thus, no ambiguity arises if we write u • v x w rather than u • (v x w). 
However, for clarity we will usually keep the parentheses. 



It follows from (7) that 

u • (v x w) = w • (u x v) = v • (w x u) 

since the 3 x 3 determinants that represent these products can be obtained from one 
another by two row interchanges. (Verify.) These relationships can be remembered by 
moving the vectors u, v, and w clockwise around the vertices of the triangle in Figure 3.5.6. 


▲ Figure 3.5.6 
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Geometric Interpretation of 
Determinants 


The next theorem provides a useful geometric interpretation of 2 x 2 and 3x3 deter- 
minants. 


det 


THEOREM 3.5.4 

(a) The absolute value of the determinant 

"« i uf 
_vi v 2 _ 

is equal to the area of the parallelogram in 2-space determined by the vectors 
u = (u\, u 2 ) andv = (ui, v 2 ). (See Figure 3.5.7 a.) 

(b) The absolute value of the determinant 


det 


is equal to the volume of the parallelepiped in 3-space determined by the vectors 
u = (u\, u 2 , M 3 ), v = (i>i, v 2 , ^ 3 ), and w = (w\, w 2 , vo 3 ). (See Figure 3.5.1b.) 


U\ 

U 2 

u 2 

V\ 

v 2 

V3 

W 1 

w 2 

W 3 




Proof (a) The key to the proof is to use Theorem 3.5.3. However, that theorem applies 
to vectors in 3-space, whereas u = (u\,u 2 ) and v = (ui, v 2 ) are vectors in 2-space. To 
circumvent this “dimension problem,” we will view u and v as vectors in the xy-plane of 
an xyz-coordinate system (Figure 3.5.7c), in which case these vectors are expressed as 
u = (u\, u 2 , 0) and v = (i>i, v 2 , 0). Thus 


i J k 



J 


U\ 

u 2 

k = det 

U\ 

m 2 " 

U X V = 

Ml U 2 0 

= 



Vl 

v 2 


.Vl 

v 2 _ 


It now follows from Theorem 3.5.3 and the fact that ||k|| = 1 that the area A of the 
parallelogram determined by u and v is 


det 

U\ 

ll 2 



det 

U\ 

u 2 


det 

U 1 

u 2 

k 

= 

||k|| = 


y\ 

v 2 _ 




yi 

v 2 _ 



.Vl 

v 2 \ 


which completes the proof. 


Proof (b) As shown in Figure 3.5.8, take the base of the parallelepiped determined by u, 
v, and w to be the parallelogram determined by v and w. It follows from Theorem 3.5.3 
that the area of the base is ||v x w|| and, as illustrated in Figure 3.5.8, the height h of 
the parallelepiped is the length of the orthogonal projection of u on v x w. Therefore, 
by Formula (12) of Section 3.3, 
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h = || proj vxw u|| = 


|u • (v X w)| 


It follows that the volume V of the parallelepiped is 

|u • (v x w)| 

V = (area of base) • height = ||v x w" 
so from (7), 


V X w 


= |u • (v x w)| 


V = 


det 


Ml 

U 2 

U 2 

Vl 

V 2 

V3 

W\ 

W 2 

ll >3 


( 8 ) 


which completes the proof. 


Remark If V denotes the volume of the parallelepiped determined by vectors u, v, and w, then 
it follows from Formulas (7) and (8) that 


V = 


volume of parallelepiped 
determined by u, v, and w 


|u • (v x w)| 


(9) 


From this result and the discussion immediately following Definition 3 of Section 3.2, we can 
conclude that 

u • (v x w) = ± V 

where the + or — results depending on whether u makes an acute or an obtuse angle with v x w. 


Formula (9) leads to a useful test for ascertaining whether three given vectors lie in 
the same plane. Since three vectors not in the same plane determine a parallelepiped of 
positive volume, it follows from (9) that |u • (v x w)| = 0 if and only if the vectors u, v, 
and w lie in the same plane. Thus we have the following result. 


1EM 3.5.5 If the vectors u = (mi, ui, M 3 ), v = (ui, Vi, U 3 ), and 
w = (w 1 , W 2 , W 3 ) have the same initial point, then they lie in the same plane if and only 

if 


Ml 

M 2 

U 3 

Vi 

Vl 

v 3 

W\ 

w 2 

W 3 


Exercise Set 3.5 

In Exercises 1 2, let u = (3, 2, — 1), v = (0, 2, —3), and 
w = (2, 6, 7). Compute the indicated vectors. 

1. (a) v x w (b)»xv (c) (u + v) x w 


In Exercises 5-6, let u, v, and w be the vectors in Exercises 1-2. 
Compute the vector triple product directly, and check your result 
by using parts (d) and ( e ) of Theorem 3.5.1. 


(d) v • (v x w) 


(e) v x v 


(f) (u - 3w) x (u - 3w) 5. u x (v x w) 


6. (u x v) x w 


2. (a) u x v (b) — (u x v) (c) u x (v + w) 

(d) w • (w x v) (e) w x w (f) (7v — 3u) x (7v — 3u) 

In Exercises , let u, v, and w be the vectors in Exercises 1-2. 
Use Lagrange’s identity to rewrite the expression using only dot 
products and scalar multiplications, and then confirm your result 
by evaluating both sides of the identity. 

3. || u x w|| 2 4. || v x u|| 2 


In Exercises 7-8, use the cross product to find a vector that is 
orthogonal to both u and v. 

7. u= (-6,4,2), v = (3, 1,5) 

8. u= (1. 1,-2), v = (2, -1,2) 

In Exercises 9-10, find the area of the parallelogram deter- 
mined by the given vectors u and v. 

9. u= (1. -1,2), v = (0, 3, 1) 
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10. u= (3, -1,4), v = (6, -2, 8) 

In Exercises , find the area of the parallelogram with the 
given vertices. 

11. 6,(1, 2), 6 2 (4,4), 6 3 (7,5), 6 4 (4,3) 

12. Pi (3, 2), P 2 (5,4), 6 3 (9,4), P 4 ( 7,2) 

In Exercises 14, find the area of the triangle with the given 
vertices. 

13. A (2, 0), 6(3,4), C(— 1, 2) 

14. A(l, 1), 6( 2,2), C(3, -3) 

In Exercises 3-16, find the area of the triangle in 3-space that 
has the given vertices. 

15. 6,(2, 6,-1), P 2 (l, 1, 1), P 3 (4, 6, 2) 

16. P(l, —1,2), 0(0, 3,4), 6(6, 1,8) 

In Exercises 1 7-18, find the volume of the parallelepiped with 
sides u, v, and w. 

17. u = (2, -6, 2), v = (0, 4, -2), w = (2, 2, -4) 

18. u = (3, 1, 2), v = (4, 5, 1), w = (1, 2, 4) 

In Exercises 19-20, determine whether u, v, and w lie in the 
same plane when positioned so that their initial points coincide. 


19. u = (-1. -2, 1), v = (3, 0, -2), w = (5, -4, 0) 


20. u = (5, -2, 1), v = (4, -1, 1), w = (1, -1, 0) 


In Exercises 21—2 4, compute the scalar triple product 
u • (v x w). 

21. u = (-2, 0, 6), v = (1. -3, 1), w = (-5,-1, 1) 


22. u = (-1,2,4), v = (3,4, -2), w= (-1,2,5) 

23. u = (a, 0, 0), v = (0, b , 0), w = (0, 0, c) 

24. u = i, v = j, w = k 

In Exercises 25-26, suppose that u • (v x w) = 3. Find 

25. (a) u • (w x v) (b) (v x w) • u (c) w • (u x v) 

26. (a) v • (u x w) (b) (u x w) • v (c) v • (w x w) 

27. (a) Find the area of the triangle having vertices A(l, 0, 1), 

6(0, 2, 3), and C(2, 1.0). 

(b) Use the result of part (a) to find the length of the altitude 
from vertex C to side AB. 

28. Use the cross product to find the sine of the angle between the 
vectors u = (2, 3, —6) and v = (2, 3, 6). 

29. Simplify (u + v) x (u - v). 


Exercises 31-32 You know from your own experience that 
the tendency for a force to cause a rotation about an axis depends 
on the amount of force applied and its distance from the axis of 
rotation. For example, it is easier to close a door by pushing on 
its outer edge than close to its hinges. Moreover, the harder you 
push, the faster the door will close. In physics, the tendency for a 
force vector F to cause rotational motion is a vector called torque 
(denoted by r). It is defined as 

r = F x d 

where d is the vector from the axis of rotation to the point at which 
the force is applied. It follows from Formula (6) that 

II r || = ||F x d || = || F|| ||d|| sine 

where 9 is the angle between the vectors F and d. This is called the 
scalar moment of F about the axis of rotation and is typically mea- 
sured in units of Newton-meters (Nm) or foot pounds (ft-lb). 


31. The accompanying figure shows a force F of 1000 N applied 
to the corner of a box. 

(a) Find the scalar moment of F about the point P . 

(b) Find the direction angles of the vector moment of F about 
the point P to the nearest degree. [See directions for Ex- 
ercises 21-25 of Section 3.2.] 



◄ Figure Ex-31 


32. As shown in the accompanying figure, a force of 200 N is ap- 
plied at an angle of 18° to a point near the end of a monkey 
wrench. Find the scalar moment of the force about the center 
of the bolt. [Note: Treat the wrench as two-dimensional.] 



Working with Proofs 

33. Let u, v, and w be nonzero vectors in 3-space with the same 
initial point, but such that no two of them are collinear. Prove 
that 

(a) u x (v x w) lies in the plane determined by v and w. 

(b) (u x v) x w lies in the plane determined by u and v. 


30. Let a = (a,, « 2 , a 3 ), b = (6,, b 2 , fo 3 ), c = (c,, c 2 , c 3 ), and 
d = (di, d 2 , d 3 ). Show that 

(a + d) • (b x c) = a • (b x c) + d • (b x c) 


34. Prove the following identities. 

(a) (u + k\) x v = u x v 

(b) u • (v x z) = — (u x z) • v 
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35. Prove: If a, b, c, and d lie in the same plane, then 
(a x b) x (c x d) = 0. 

36. Prove: If 9 is the angle between u and v and u • v / 0, then 
tan 9 = || u x v||/(u • v). 

37. Prove that if u, v, and w are vectors in R 3 , 4 no two of which are 
collinear, then u x (v x w) lies in the plane determined by v 
and w. 

38. It is a theorem of solid geometry that the volume of a tetra- 
hedron is | (area of base) ■ (height). Use this result to prove 
that the volume of a tetrahedron whose sides are the vectors 
a, b, and c is 2|a • (b x c)| (see accompanying figure). 


◄ Figure Ex-38 

39. Use the result of Exercise 38 to find the volume of the tetra- 
hedron with vertices P, Q , R , S. 

(a) P(— 1,2,0), 2(2, 1,-3), 6(1, 1,1), S(3, -2, 3) 

(b) 6(0, 0, 0), 2(1.2, -1), 6(3, 4, 0), 5(-l,-3,4) 

40. Prove part (d) of Theorem 3.5.1. [Hint: First prove the 
result in the case where w = i= (1,0,0), then when 
w = j = (0, 1, 0), and then when w = k = (0, 0, 1). Finally, 
prove it for an arbitrary vector w = (u>i, tdi, 103) by writing 
w = wii + w 2 j + U^k.] 

41. Prove part (e) of Theorem 3.5.1. [Hint: Apply part (a) of 
Theorem 3.5.2 to the result in part (d) of Theorem 3.5.1.] 

42. Prove: 

(a) Prove (b) of Theorem 3.5.2. 

(b) Prove (c) of Theorem 3.5.2. 



(c) Prove (d) of Theorem 3.5.2. 

(d) Prove (e) of Theorem 3.5.2. 

(e) Prove (/) of Theorem 3.5.2. 

True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 
false, and justify your answer. 

(a) The cross product of two nonzero vectors u and v is a nonzero 
vector if and only if u and v are not parallel. 

(b) A normal vector to a plane can be obtained by taking the cross 
product of two nonzero and noncollinear vectors lying in the 
plane. 

(c) The scalar triple product of u, v, and w determines a vector 
whose length is equal to the volume of the parallelepiped de- 
termined by u, v, and w. 

(d) If u and v are vectors in 3-space, then ||v x u|| is equal to the 
area of the parallelogram determined by u and v. 

(e) For all vectors u, v, and w in 3-space, the vectors (u x v) x w 
and u x (v x w) are the same. 

(f) If u, v, and w are vectors in 6 3 , where u is nonzero and 
u x v = u x w, then v = w. 

Working with Technology 

Tl. As stated in Exercise 23 above, the distance d in 3-space from 
a point P to the line L through points A and B is given by the 
formula 

d WAP x Agj| 

IIA5II 

Find the distance between the point P(\, 3, 1) and the line through 
the points A(2, —3, 4) and 6(4, 7, —2). 


Supplementary Exercises 


1. Let u =(-2,0,4), v=(3, -1,6), and w = (2, -5, -5). 
Compute 

(a) 3v — 2u (b)||u + v + w|| 

(c) the distance between — 3u and v + 5w 

(d) proj w u (e) u ■ (v x w) 

(f) (— 5v + w) x ((u-v)w) 

2. Repeat Exercise 1 for the vectors u = 3i — 5j + k, 
v = — 2i + 2k, and w = -j + 4k. 

3. Repeat parts (a)-(d) of Exercise 1 for the vectors 
u = (-2, 6, 2, 1), v = (-3, 0, 8, 0), and 

w= (9, 1,-6, -6). 

4. (a) The set of all vectors in R 2 that are orthogonal to a nonzero 

vector is what kind of geometric object? 


(b) The set of all vectors in R 3 that are orthogonal to a nonzero 
vector is what kind of geometric object? 

(c) The set of all vectors in R 2 that are orthogonal to two 
noncollinear vectors is what kind of geometric object? 

(d) The set of all vectors in R 3 that are orthogonal to two 
noncollinear vectors is what kind of geometric object? 

5. Let A, 6, and C be three distinct noncollinear points in 3- 
space. Describe the set of all points P that satisfy the vector 
equation AP • ( AB x AC) = 0. 

6. Let A, 6, C, and D be four distinct noncollinear points in 
3-space. If AB x CD 7^ 0 and AC • (AB x CD) = 0, explain 
why the line through A and B must intersect the line through 
C and D. 
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7. Consider the points P(3, —1, 4), <2(6, 0, 2), and R( 5, 1, 1). 
Find the point S in R 3 whose first component is — 1 and such 
that PQ is parallel to RS. 

8. Consider the points P(— 3, 1, 0, 6), <2(0, 5, 1,-2), and 
R(— 4, 1, 4, 0). Find the point S in R 4 whose third compo- 
nent is 6 and such that PQ is parallel to RS. 

9. Using the points in Exercise 7, find the cosine of the angle 
between the vectors PQ and PR. 

10. Using the points in Exercise 8, find the cosine of the angle 
between the vectors PQ and PR. 

11. Find the distance between the point P(— 3, 1 , 3) and the plane 
5x + z = 3y — 4. 

12. Show that the planes 3x — y + 6z = 7 and 

— 6x + 2y — 12 z — 1 are parallel, and find the distance be- 
tween them. 


17. The line in R 2 with equation y = 3x — 5. 

18. The plane in R 3 with equation 2x — 6y + 3z — 5. 

In Exercises 19-21 , find a point-normal equation for the given 

plane. 

19. The plane that is represented by the vector equation 
(x, y, z) = (-1, 5, 6) + r,(0, -1, 3) + t 2 ( 2, -1, 0). 

20. The plane that contains the point P(— 5, 1,0) and is orthogo- 
nal to the line with parametric equations x — 3 — 5t, y = 2t , 
and z = 7. 

21. The plane that passes through the points P( 9, 0, 4), 

<2(-l,4, 3),andtf(0,6, -2). 

22. Suppose that V — {V| , v 2 , v 3 ) and W = {wi, w 2 ) are two sets 
of vectors such that each vector in V is orthogonal to each vec- 
tor in W . Prove that if at, a 2 , a 2 ,b i, b 2 are any scalars, then 
the vectors v = Vj + a 2 v 2 + a } \ 2 and w = (qMq + b 2 w 2 are 
orthogonal. 


In Exercises 3-18, find vector and parametric equations for 

the line or plane in question. 

13. The plane in R 3 that contains the points P(— 2, 1, 3), 
<2(— 1, —1, 1), and R(3, 0, -2). 

14. The line in R 3 that contains the point P(— 1, 6, 0) and is or- 
thogonal to the plane 4 jc — z = 5. 

15. The line in R 2 that is parallel to the vector v = (8, —1) and 
contains the point P( 0, —3). 

16. The plane in R 3 that contains the point P{— 2, 1, 0) and is 
parallel to the plane — 8x + 6y — z = 4. 


23. Show that in 3-space the distance d from a point P to the line 
L through points A and B can be expressed as 

\\APxAB\\ 

II ABU 

24. Prove that ||u + v|| = u + ||v|| if and only if one of the vec- 
tors is a scalar multiple of the other. 

25. The equation Ax + By = 0 represents a line through the ori- 
gin in R 2 if A and B are not both zero. What does this equation 
represent in R 3 if you think of it as Ax + By + 0z = 0? Ex- 
plain. 
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Recall that we began our study of vectors by viewing them as directed line segments 
(arrows). We then extended this idea by introducing rectangular coordinate systems, 
which enabled us to view vectors as ordered pairs and ordered triples of real numbers. 
As we developed properties of these vectors we noticed patterns in various formulas 
that enabled us to extend the notion of a vector to an n -tuple of real numbers. 
Although n -tuples took us outside the realm of our “visual experience,” it gave us a 
valuable tool for understanding and studying systems of linear equations. In this 
chapter we will extend the concept of a vector yet again by using the most important 
algebraic properties of vectors in R n as axioms. These axioms, if satisfied by a set of 
objects, will enable us to think of those objects as vectors. 


Real Vector Spaces 

In this section we will extend the concept of a vector by using the basic properties of vectors 
in R" as axioms, which if satisfied by a set of objects, guarantee that those objects behave 
like familiar vectors. 


The following definition consists of ten axioms, eight of which are properties of vectors 
in R" that were stated in Theorem 3.1.1. It is important to keep in mind that one does 
not prove axioms; rather, they are assumptions that serve as the starting point for proving 
theorems. 
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In this text scalars will be ei- 
ther real numbers or complex 
numbers. Vector spaces with 
real scalars will be called real 
vector spaces and those with 
complex scalars will be called 
complex vector spaces. There 
is a more general notion of a 
vector space in which scalars 
can come from a mathematical 
structure known as a “field,” 
but we will not be concerned 
with that level of generality. 
For now, we will focus exclu- 
sively on real vector spaces, 
which we will refer to sim- 
ply as “vector spaces.” We 
will consider complex vector 
spaces later. 


DEFINITION 1 Let V be an arbitrary nonempty set of objects on which two operations 
are defined: addition, and multiplication by numbers called scalars. By addition we 
mean a rule for associating with each pair of objects u and v in V an object u + v, 
called the sum of u and v; by scalar multiplication we mean a rule for associating with 
each scalar k and each object u in V an object ku, called the scalar multiple of u by k. 
If the following axioms are satisfied by all objects u, v, w in V and all scalars k and 
m. then we call V a vector space and we call the objects in V vectors. 

If u and v are objects in V. then u + v is in V. 

2 . u + v = v + u 

3. u + (v + w) = (u + v) + w 

There is an object 0 in V, called a zero vector for V, such that 0 + u = u + 0 = u 
for all u in V. 

5. For each u in V, there is an object — u in V, called a negative of u, such that 
u + (-u) = (— u) + u = 0. 

If k is any scalar and u is any object in V, then ku is in V. 

7. k(u + v) = ku + kv 

8. (k + m)u = ku + mu 

9. k(mu) = (km){ u) 

10. lu = u 


Observe that the definition of a vector space does not specify the nature of the vectors 
or the operations. Any kind of object can be a vector, and the operations of addition 
and scalar multiplication need not have any relationship to those on R" . The only 
requirement is that the ten vector space axioms be satisfied. In the examples that follow 
we will use four basic steps to show that a set with two operations is a vector space. 


To Show That a Set with Two Operations Is a Vector Space 

Step 1. Identify the set V of objects that will become vectors. 

Step 2. Identify the addition and scalar multiplication operations on V. 

Step 3. Verify Axioms 1 and 6; that is, adding two vectors in V produces a vector 
in V, and multiplying a vector in V by a scalar also produces a vector in V. 
Axiom 1 is called closure under addition , and Axiom 6 is called closure under 
scalar multiplication. 

Step 4. Confirm that Axioms 2, 3, 4, 5, 7, 8, 9, and 10 hold. 


The notion of an "abstract vector 
space" evolved over many years and had many 
contributors. The idea crystallized with the work 
of the German mathematician H. G. Grassmann, 
who published a paper in 1862 in which he con- 
sidered abstract systems of unspecified elements 
on which he defined formal operations of addi- 
tion and scalar multiplication. Grassmann's work 
was controversial, and others, including Augustin 
Cauchy (p. 121 ), laid reasonable claim to the idea. 
[Image: © Sueddeutsche Zeitung Photo/The 
Image Works] 

Hermann Gunther 

Grassmann 

(1809-1877) 
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Our first example is the simplest of all vector spaces in that it contains only one 
object. Since Axiom 4 requires that every vector space contain a zero vector, the object 
will have to be that vector. 


► EXAMPLE 1 The Zero Vector Space 

Let V consist of a single object, which we denote by 0, and define 

0 + 0 = 0 and £0 = 0 

for all scalars k. It is easy to check that all the vector space axioms are satisfied. We call 
this the zero vector space. 


Our second example is one of the most important of all vector spaces — the familiar 
space R n . It should not be surprising that the operations on R n satisfy the vector space 
axioms because those axioms were based on known properties of operations on R" . 


► EXAMPLE 2 R n Is a Vector Space 

Let V = R" , and define the vector space operations on V to be the usual operations of 
addition and scalar multiplication of n -tuples; that is, 

u + V = (mi, U2, U„ ) + (Ui, V2, v n ) = (u 1 + V\, U 2 + V 2, . . . , u„ + v n ) 
ku — (ku i, kii 2 , . . . , ku n ) 

The set V — R" is closed under addition and scalar multiplication because the foregoing 
operations produce n -tuples as their end result, and these operations satisfy Axioms 2, 
3, 4, 5, 7, 8, 9, and 10 by virtue of Theorem 3.1.1. 


Our next example is a generalization of R" in which we allow vectors to have infinitely 
many components. 


► EXAMPLE 3 The Vector Space of Infinite Sequences of Real Numbers 

Let V consist of objects of the form 

U = (Ml, U 2 , . . ■ , U„, . . .) 


in which mi, M2, . . . , m„, . . . is an infinite sequence of real numbers. We define two infi- 
nite sequences to be equal if their corresponding components are equal, and we define 
addition and scalar multiplication componentwise by 


U + V = (Ml, M 2 , . . . , M„, . . .) + (ill, V 2 ,...,V n ,.. .) 
= (n i + V\, U 2 + V 2 , U„ + V„, . . .) 


ku = (ku 1 , ku 2 , . . . , ku „ , . . .) 
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▲ Figure 4.1.1 


In the exercises we ask you to confirm that V with these operations is a vector space. We 
will denote this vector space by the symbol R°°. M 


Vector spaces of the type in Example 3 arise when a transmitted signal of indefinite 
duration is digitized by sampling its values at discrete time intervals (Figure 4.1.1). 

In the next example our vectors will be matrices. This may be a little confusing at 
first because matrices are composed of rows and columns, which are themselves vectors 
(row vectors and column vectors). However, from the vector space viewpoint we are not 
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concerned with the individual rows and columns but rather with the properties of the 
matrix operations as they relate to the matrix as a whole. 


Note that Equation (1) in- 
volves three different addition 
operations: the addition op- 
eration on vectors, the ad- 
dition operation on matrices, 
and the addition operation on 
real numbers. 


► EXAMPLE 4 The Vector Space of 2 x 2 Matrices 

Let V be the set of 2 x 2 matrices with real entries, and take the vector space operations 
on V to be the usual operations of matrix addition and scalar multiplication; that is, 


u + v 
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The set V is closed under addition and scalar multiplication because the foregoing oper- 
ations produce 2x2 matrices as the end result. Thus, it remains to confirm that Axioms 
2, 3, 4, 5, 7, 8, 9, and 10 hold. Some of these are standard properties of matrix operations. 
For example, Axiom 2 follows from Theorem 1.4.1(a) since 


U + V = 
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Similarly, Axioms 3, 7, 8, and 9 follow from parts (b), (h), ( j), and (e), respectively, of 
that theorem (verify). This leaves Axioms 4, 5, and 10 that remain to be verified. 

To confirm that Axiom 4 is satisfied, we must find a 2 x 2 matrix 0 in V for which 
u + 0 = 0 + u for all 2 x 2 matrices in V. We can do this by taking 


With this definition, 
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and similarly u + 0 = u. To verify that Axiom 5 holds we must show that each object 
u in V has a negative — u in V such that u + (— u) = 0 and (— u) + u = 0. This can be 
done by defining the negative of u to be 


— u 
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With this definition, 
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and similarly (— u) + u = 0. Finally, Axiom 10 holds because 
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► EXAMPLE 5 The Vector Space of m x n Matrices 

Example 4 is a special case of a more general class of vector spaces. You should have 
no trouble adapting the argument used in that example to show that the set V of all 
m x n matrices with the usual matrix operations of addition and scalar multiplication is 
a vector space. We will denote this vector space by the symbol M mn . Thus, for example, 
the vector space in Example 4 is denoted as M 22 . 
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► EXAMPLE 6 The Vector Space of Real-Valued Functions 

Let V be the set of real-valued functions that are defined at each x in the interval (—00,00). 
If f = fix) and g = g(x) are two functions in V and if k is any scalar, then define the 
operations of addition and scalar multiplication by 

(f + g)M = fix) + g(x) (2) 

(kf)(x) = kf(x) ( 3 ) 


In Example 6 the functions 
were defined on the entire in- 
terval (—00, 00). However, the 
arguments used in that exam- 
ple apply as well on all subin- 
tervals of (—00, c°), such as 
a closed interval [a, b ] or an 
open interval ( a,b ). We will 
denote the vector spaces of 
functions on these intervals by 
F[a,b] and F(a, b), respec- 
tively. 


One way to think about these operations is to view the numbers fix) and g(x) as “com- 
ponents” of f and g at the point x, in which case Equations ( 2 ) and ( 3 ) state that two 
functions are added by adding corresponding components, and a function is multiplied 
by a scalar by multiplying each component by that scalar — exactly as in R " and R x . This 
idea is illustrated in parts (a) and (, b ) of Figure 4 . 1 . 2 . The set V with these operations is 
denoted by the symbol F(— co, 00). We can prove that this is a vector space as follows: 

Axioms 1 and 6: These closure axioms require that if we add two functions that are 
defined at each x in the interval (—00, 00), then sums and scalar multiples of those func- 
tions must also be defined at each x in the interval (—00, 00). This follows from Formulas 
( 2 ) and ( 3 ). 

Axiom 4: This axiom requires that there exists a function 0 in F(— 00, co), which when 
added to any other function f in F(— 00, 00) produces f back again as the result. The 
function whose value at every point x in the interval (—a), 00) is zero has this property. 
Geometrically, the graph of the function 0 is the line that coincides with the x-axis. 
Axiom 5: This axiom requires that for each function f in F(— 00, 00) there exists a function 
— f in F(— 00, 00), which when added to f produces the function 0. The function defined 
by — f(x) — —f(x) has this property. The graph of — f can be obtained by reflecting the 
graph of f about the x-axis (Figure 4 . 1 . 2 c). 

Axioms 2, 3, 7, 8, 9, 10: The validity of each of these axioms follows from properties of 
real numbers. For example, if f and g are functions in F(— 00, co), then Axiom 2 requires 
that f + g = g + f. This follows from the computation 

(f + g)0) = fix) + g(x) = g(x) + f(x) = (g + f)(x) 

in which the first and last equalities follow from ( 2 ), and the middle equality is a property 
of real numbers. We will leave the proofs of the remaining parts as exercises. M 



▲ Figure 4.1.2 



It is important to recognize that you cannot impose any two operations on any set 
V and expect the vector space axioms to hold. For example, if V is the set of n-tuples 
with positive components, and if the standard operations from R" are used, then V is not 
closed under scalar multiplication, because if u is a nonzero n-tuple in V, then (— l)u has 
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Some Properties of Vectors 


at least one negative component and hence is not in V. The following is a less obvious 
example in which only one of the ten vector space axioms fails to hold. 

EXAMPLE 7 A Set That Is Not a Vector Space 

Let V = R 2 and define addition and scalar multiplication operations as follows: If 
u = (u\,u 2 ) and v = ( 1 + v 2 ), then define 

u + v = (mi + Vi, u 2 + v 2 ) 
and if k is any real number, then define 

ku = (ku i , 0) 

For example, if u = (2, 4), v = (—3, 5), and k = 7, then 

u + v= (2 + (-3), 4 + 5) = (-1,9) 
ku = 7u = (7 • 2, 0) = (14, 0) 

The addition operation is the standard one from R 2 , but the scalar multiplication is not. 
In the exercises we will ask you to show that the first nine vector space axioms are satisfied. 
However, Axiom 10 fails to hold for certain vectors. For example, if u = (u i, u 2 ) is such 
that u 2 / 0, then 

lu = 1 (m i , u 2 ) = (1 • Hi, 0) = (mi, 0) ^ u 
Thus, V is not a vector space with the stated operations. . 


Our final example will be an unusual vector space that we have included to illustrate 
how varied vector spaces can be. Since the vectors in this space will be real numbers, 
it will be important for you to keep track of which operations are intended as vector 
operations and which ones as ordinary operations on real numbers. 

► EXAMPLE 8 An Unusual Vector Space 

Let V be the set of positive real numbers, let u = u and v = v be any vectors (i.e., positive 
real numbers) in V, and let k be any scalar. Define the operations on V to be 

U + V = U V | Vector addition is numerical multiplication. | 
ku = U ^ | Scalar multiplication is numerical exponentiation. | 

Thus, for example, 1 + 1 = 1 and (2)(1) = l 2 = 1 — strange indeed, but nevertheless 
the set V with these operations satisfies the ten vector space axioms and hence is a vector 
space. We will confirm Axioms 4, 5, and 7, and leave the others as exercises. 

Axiom 4 — The zero vector in this space is the number 1 (i.e., 0=1) since 

u + 1 = u ■ 1 = u 

Axiom 5 — The negative of a vector u is its reciprocal (i.e., — u = I/m) since 



Axiom 7 — k(u + v) = ( uv) k — u k v k = (ku) + (kv). 


The following is our first theorem about vector spaces. The proof is very formal with 
each step being justified by a vector space axiom or a known property of real numbers. 
There will not be many rigidly formal proofs of this type in the text, but we have included 
this one to reinforce the idea that the familiar properties of vectors can all be derived 
from the vector space axioms. 
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A Closing Observation 


Let V be a vector space , u a vector in V, and k a scalar; then: 

(a) Ou = 0 
(. b ) £0 = 0 

(c) (-l)u=-u 

(d) Ifku = 0, then k = 0 or u = 0. 


We will prove parts (a) and (c) and leave proofs of the remaining parts as exercises. 

Proof (a) We can write 

Ou + Ou = (0 + 0)u (Axiom 8] 

= Ou (Property of the number 0( 

By Axiom 5 the vector Ou has a negative, — Ou. Adding this negative to both sides above 


yields 


[0u + Ou] + (-0u) = Ou + (- 

-0u) 

or 


Ou + [Ou + (-0u)] = Ou + (-0u) 

( Axiom 3 1 


0u + 0 = 0 

[ Axiom 5 1 


O 

e 

II 

o 

[ Axiom 4 1 


Proof (c) To prove that (— l)u = — u, we must show that u + (— l)u = 0. The proof is 
as follows: 

U + (— l)u = lu + (— l)u (Axiom 10| 

= (1 + ( — 1 ) ) U |Axiom8] 

= Ou | Property of numbers ( 

= 0 | Part (a) of this theorem ] 


This section of the text is important to the overall plan of linear algebra in that it estab- 
lishes a common thread among such diverse mathematical objects as geometric vectors, 
vectors in R", infinite sequences, matrices, and real-valued functions, to name a few. 
As a result, whenever we discover a new theorem about general vector spaces, we will 
at the same time be discovering a theorem about geometric vectors, vectors in R n , se- 
quences, matrices, real-valued functions, and about any new kinds of vectors that we 
might discover. 

To illustrate this idea, consider what the rather innocent-looking result in part (a) 
of Theorem 4.1.1 says about the vector space in Example B. Keeping in mind that the 
vectors in that space are positive real numbers, that scalar multiplication means numerical 
exponentiation, and that the zero vector is the number 1, the equation 


Ou = 0 


is really a statement of the familiar fact that if u is a positive real number, then 


u° = 1 
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Exercise Set 4.1 

1. Let V be the set of all ordered pairs of real numbers, and 
consider the following addition and scalar multiplication op- 
erations on u = (mi, ut) and v = (iq, V 2 ): 

u + v = (u 1 + VI, Ul + V2), fcu = (0, ku. 2 ) 

(a) Compute u + v and ku for u = (— 1, 2), v = (3, 4), and 
k = 3. 

(b) In words, explain why V is closed under addition and 
scalar multiplication. 

(c) Since addition on V is the standard addition operation on 
R 2 , certain vector space axioms hold for V because they 
are known to hold for R 2 . Which axioms are they? 

(d) Show that Axioms 7, 8, and 9 hold. 

(e) Show that Axiom 10 fails and hence that V is not a vector 
space under the given operations. 

2. Let V be the set of all ordered pairs of real numbers, and 
consider the following addition and scalar multiplication op- 
erations on u = (mi, u 2 ) and v = (iq, V2): 

u + v = (mi + Di + 1, u 2 + V 2 + 1). ku = (ku 1 , ku 2 ) 

(a) Compute u + v and ku for u = (0, 4), v = (1, —3), and 
k = 2. 

(b) Show that (0, 0) ^ 0. 

(c) Show that (— 1 , — 1) = 0. 

(d) Show that Axiom 5 holds by producing an ordered pair 
— u such that u + (— u) = 0 for u = (u \ , mj). 

(e) Find two vector space axioms that fail to hold. 

In Exercises , determine whether each set equipped with 
the given operations is a vector space. For those that are not vector 
spaces identify the vector space axioms that fail. 

3. The set of all real numbers with the standard operations of 
addition and multiplication. 

4. The set of all pairs of real numbers of the form (x, 0) with the 
standard operations on R 2 . 

5. The set of all pairs of real numbers of the form (x, y), where 
x > 0, with the standard operations on R 2 . 

6. The set of all n -tuples of real numbers that have the form 
(x, x, . . . , x) with the standard operations on R" . 

7. The set of all triples of real numbers with the standard vector 
addition but with scalar multiplication defined by 

k(x, y, z ) = (k 2 x, k 2 y, k 2 z) 

8. The set of all 2 x 2 invertible matrices with the standard ma- 
trix addition and scalar multiplication. 


9. The set of all 2 x 2 matrices of the form 

a 0 

_° b . 

with the standard matrix addition and scalar multiplication. 

10. The set of all real-valued functions / defined everywhere on 
the real line and such that /( 1) = 0 with the operations used 
in Example 6. 

11. The set of all pairs of real numbers of the form (1, x) with the 
operations 

(l,y) + (l,/) = (l,y + y') and *(1, y) = (1, ky) 

12. The set of polynomials of the form ao + ci\X with the opera- 
tions 

(flo + a\x) + (bo + b { x) = (a 0 + b 0 ) + (ai + b{)x 

and 

k(a a + ci\x) = (kao) + (ka^x 

13. Verify Axioms 3, 7, 8, and 9 for the vector space given in Ex- 
ample 4. 

14. Verify Axioms 1 , 2, 3, 7, 8, 9, and 1 0 for the vector space given 
in Example 6. 

15. With the addition and scalar multiplication operations defined 
in Example 7, show that V = R 2 satisfies Axioms 1-9. 

16. Verify Axioms 1 , 2, 3, 6, 8, 9, and 1 0 for the vector space given 
in Example 8. 

17. Show that the set of all points in R 2 lying on a line is a vector 
space with respect to the standard operations of vector ad- 
dition and scalar multiplication if and only if the line passes 
through the origin. 

18. Show that the set of all points in R 2 lying in a plane is a vector 
space with respect to the standard operations of vector addi- 
tion and scalar multiplication if and only if the plane passes 
through the origin. 

In Exercises 19-20, let V be the vector space of positive real 

numbers with the vector space operations given in Example 8. Let 

u = m be any vector in V, and rewrite the vector statement as a 

statement about real numbers. 

19. — u = ( — l)u 

20 . ku = 0 if and only if k = 0 or u = 0 . 

Working with Proofs 

21. The argument that follows proves that if u, v, and w are vectors 
in a vector space V such that u + w = v + w, then u = v (the 
cancellation law for vector addition). As illustrated, justify the 
steps by filling in the blanks. 
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u + w = v + w Hypothesis 

(u + w) + (— w) = (v + w) + (— w) Add — w to both sides. 

u + [w + (— w)] = v + [w + (— w)] 

u + 0 = v + 0 

u = v 

22. Below is a seven-step proof of part ( b ) of Theorem 4 . 1 . 1 . 
Justify each step either by stating that it is true by hypothesis 
or by specifying which of the ten vector space axioms applies. 

Hypothesis: Let u be any vector in a vector space V, let 0 be 
the zero vector in V , and let k be a scalar. 

Conclusion: Then AO = 0. 

Proof: (1) kO + Au = k ( 0 + u) 

( 2 ) = Au 

(3) Since ku is in V , — ku is in V. 

(4) Therefore, (AO + Au) + (— ku) = ku + (—ku). 

(5) AO + (Au + (— ku)) = Aru + (—Aru) 

(6) AO + 0 = 0 

(7) ArO = 0 

In Exercises 24, let u be any vector in a vector space V . 
Give a step-by-step proof of the stated result using Exercises 21 
and 22 as models for your presentation. 

23. 0u = 0 24. -u=(-l)u 

In Exercises 25-27, prove that the given set with the stated 
operations is a vector space. 


25. The set V = {0} with the operations of addition and scalar 
multiplication given in Example 1 . 

26. The set R°° of all infinite sequences of real numbers with the 
operations of addition and scalar multiplication given in Ex- 
ample 3. 

27. The set M mn of all m x n matrices with the usual operations 
of addition and scalar multiplication. 

28. Prove: If u is a vector in a vector space V and k a scalar such 
that Aru = 0, then either Ar = 0 or u = 0. [ Suggestion : Show 
that if Au = 0 and k 0, then u = 0. The result then follows 
as a logical consequence of this.] 

True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 

(a) A vector is any element of a vector space. 

(b) A vector space must contain at least two vectors. 

(c) If u is a vector and k is a scalar such that Au = 0, then it must 
be true that A = 0. 

(d) The set of positive real numbers is a vector space if vector 
addition and scalar multiplication are the usual operations of 
addition and multiplication of real numbers. 

(e) In every vector space the vectors (— l)u and — u are the same. 

(f) In the vector space F(— co, oo) any function whose graph passes 
through the origin is a zero vector. 


4.2 Subspaces 

It is often the case that some vector space of interest is contained within a larger vector space 
whose properties are known. In this section we will show how to recognize when this is the 
case, we will explain how the properties of the larger vector space can be used to obtain 
properties of the smaller vector space, and we will give a variety of important examples. 

We begin with some terminology. 


DEFINITION 1 A subset IT of a vector space V is called a suhspace of V if IT is itself 
a vector space under the addition and scalar multiplication defined on V. 


In general, to show that a nonempty set W with two operations is a vector space one 
must verify the ten vector space axioms. However, if IT is a subspace of a known vector 
space V, then certain axioms need not be verified because they are “inherited” from V. 
For example, it is not necessary to verify that u + v = v + u holds in IT because it holds 
for all vectors in V including those in IT. On the other hand, it is necessary to verify 
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that W is closed under addition and scalar multiplication since it is possible that adding 
two vectors in W or multiplying a vector in W by a scalar produces a vector in V that is 
outside of W (Figure 4.2.1). Those axioms that are not inherited by W are 
Axiom 1 — Closure of W under addition 
Axiom 4 — Existence of a zero vector in W 
Axiom 5 — Existence of a negative in W for every vector in W 
Axiom 6 — Closure of W under scalar multiplication 

so these must be verified to prove that it is a subspace of V. However, the next theorem 
shows that if Axiom 1 and Axiom 6 hold in IT, then Axioms 4 and 5 hold in IT as a 
consequence and hence need not be verified. 


► Figure 4. 2. The vectors u 
and v are in IT, but the vectors 
u + v and ku are not. 



If IT is a set of one or more vectors in a vector space V, then W is a 
subspace of V if and only if the following conditions are satisfied. 

(a) If u and v are vectors in W, then u + v is in W. 

(b) If k is a scalar and u is a vector in IT, then ku is in W. 


Theorem 4.2.1 states that W is 
a subspace of V if and only if 
it is closed under addition and 
scalar multiplication. 


Proof If W is a subspace of V, then all the vector space axioms hold in IT, including 
Axioms 1 and 6, which are precisely conditions (a) and ( b ). 

Conversely, assume that conditions (a) and (b) hold. Since these are Axioms 1 and 
6, and since Axioms 2, 3, 7, 8, 9, and 10 are inherited from V, we only need to show 
that Axioms 4 and 5 hold in IT. For this purpose, let u be any vector in IT. It follows 
from condition (h) that ku is a vector in IT for every scalar k. In particular, Ou = 0 and 
(— l)u = — u are in IT, which shows that Axioms 4 and 5 hold in W. 


Note that every vector space 
has at least two subspaces, it- 
self and its zero subspace. 


► EXAMPLE 1 The Zero Subspace 

If V is any vector space, and if W — {0} is the subset of V that consists of the zero vector 
only, then IT is closed under addition and scalar multiplication since 

0 + 0 = 0 and £0 = 0 


for any scalar k. We call W the zero subspace of V. 


► EXAMPLE 2 Lines Through the Origin Are Subspaces of R 2 and of R 3 

If IT is a line through the origin of either R 2 or R 3 , then adding two vectors on the line 
or multiplying a vector on the line by a scalar produces another vector on the line, so 
IT is closed under addition and scalar multiplication (see Figure 4.2.2 for an illustration 
in R 3 ). 


4.2 Subspaces 


193 


► Figure 4.2.2 


u + v 



W 


▲ Figure 4.2.3 The vectors 
u + v and ku both lie in the same 
plane as u and v. 


y 


w „ d. !) 



▲ Figure 4.2.4 W is not closed 
under scalar multiplication. 



(a) W is closed under addition. 



(b) W is closed under scalar 
multiplication. 


► EXAMPLE 3 Planes Through the Origin Are Subspaces of R 3 

If u and v are vectors in a plane W through the origin of R 3 , then it is evident geometrically 
that u + v and ku also lie in the same plane W for any scalar k (Figure 4.2.3). Thus W 
is closed under addition and scalar multiplication. M 

Table 1 below gives a list of subspaces of R 2 and of R 3 that we have encountered thus 
far. We will see later that these are the only subspaces of R 2 and of R \ 


Table 1 


Subspaces of R 2 

Subspaces of R 3 

• 10! 

• 

(0) 

• Lines through the origin 

• 

Lines through the origin 

• R 2 

• 

Planes through the origin 


• 

R } 


► EXAMPLE 4 A Subset of R 2 That Is Not a Subspace 

Let W be the set of all points (x, y) in R 2 for which x > 0 and y > 0 (the shaded region 
in Figure 4.2.4). This set is not a subspace of R 2 because it is not closed under scalar 
multiplication. For example, v = (1, 1) is a vector in W, but (— l)v = (—1, —1) is not. 

► EXAMPLE 5 Subspaces of M nn 

We know from Theorem 1.7.2 that the sum of two symmetric n x n matrices is symmetric 
and that a scalar multiple of a symmetric n x n matrix is symmetric. Thus, the set of 
symmetric n x n matrices is closed under addition and scalar multiplication and hence 
is a subspace of M„„ . Similarly, the sets of upper triangular matrices, lower triangular 
matrices, and diagonal matrices are subspaces of M, m . 


i> EXAMPLE 6 A Subset of M nn That Is Not a Subspace 

The set W of invertible n x n matrices is not a subspace of M nn , failing on two counts — it 
is not closed under addition and not closed under scalar multiplication. We will illustrate 
this with an example in M 2 2 that you can readily adapt to M nn . Consider the matrices 


U = 


1 

2 


2 

5 


and 



2 

5 


The matrix OU is the 2x2 zero matrix and hence is not invertible, and the matrix U + V 
has a column of zeros so it also is not invertible. 
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CALCULUS REQUIRED 


CALCULUS REQUIRED 


In this text we regard all con- 
stants to be polynomials of de- 
gree zero. Be aware, however, 
that some authors do not as- 
sign a degree to the constant 0. 


The Hierarchy of Function 
Spaces 


► EXAMPLE 7 The Subspace C(— oo, oo) 

There is a theorem in calculus which states that a sum of continuous functions is con- 
tinuous and that a constant times a continuous function is continuous. Rephrased in 
vector language, the set of continuous functions on (— oo, oo) is a subspace of F{— oo, oo). 
We will denote this subspace by C(— o o, oo). 


► EXAMPLE 8 Functions with Continuous Derivatives 

A function with a continuous derivative is said to be continuously differentiable. There 
is a theorem in calculus which states that the sum of two continuously differentiable 
functions is continuously differentiable and that a constant times a continuously differ- 
entiable function is continuously differentiable. Thus, the functions that are continuously 
differentiable on (— oo, oo) form a subspace of F(—a°, oo). We will denote this subspace 
by C 1 (— oo, oo), where the superscript emphasizes that the first derivatives are continuous. 
To take this a step further, the set of functions with m continuous derivatives on (— oo, oo) 
is a subspace of F(— oo, oo) as is the set of functions with derivatives of all orders on 
(—oo, oo). We will denote these subspaces by C m (— o o, oo) and C°°(— oo, oo), respectively. 


► EXAMPLE 9 The Subspace of All Polynomials 

Recall that a polynomial is a function that can be expressed in the form 

p{x) — ao + a\x + ■ ■ ■ + a„x n (1) 

where ao, a\, . . . , a n are constants. It is evident that the sum of two polynomials is a 
polynomial and that a constant times a polynomial is a polynomial. Thus, the set W of all 
polynomials is closed under addition and scalar multiplication and hence is a subspace 
of F(— oo, oo). We will denote this space by P x . 


► EXAMPLE 10 The Subspace of Polynomials of Degree < n 

Recall that the degree of a polynomial is the highest power of the variable that occurs with 
a nonzero coefficient. Thus, for example, if a n 0 in Formula (1), then that polynomial 
has degree n. It is not true that the set W of polynomials with positive degree n is a 
subspace of F(— oo, oo) because that set is not closed under addition. For example, the 
polynomials 

1 + 2x + 3x 2 and 5 + lx — 3x 2 

both have degree 2, but their sum has degree 1. What is true, however, is that for each 
nonnegative integer n the polynomials of degree n or less form a subspace of F(— oo, oo). 
We will denote this space by P n . M 

It is proved in calculus that polynomials are continuous functions and have continuous 
derivatives of all orders on (— oo, oo). Thus, it follows that P x is not only a subspace of 
F(— oo, oo), as previously observed, but is also a subspace of C°°(— oo, oo). We leave it 
for you to convince yourself that the vector spaces discussed in Examples 7 to 10 are 
“nested” one inside the other as illustrated in Figure 4.2.5. 

Remark In our previous examples we considered functions that were defined at all points of the 
interval (— oo, oo). Sometimes we will want to consider functions that are only defined on some 
subinterval of (— oo, oo), say the closed interval [a, b] or the open interval ( a , b). In such cases 
we will make an appropriate notation change. For example, C[a, b\ is the space of continuous 
functions on [a, b ] and C(a, b) is the space of continuous functions on (a, b). 
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Building Subspaces 


The following theorem provides a useful way of creating a new subspace from known 
subspaces. 


!EM 4.2.2 IfW\.W 2 ,...,W r are subspaces of a vector space V, then the inter- 
section of these subspaces is also a subspace of V. 


Note that the first step in 
proving Theorem 4.2.2 was 
to establish that W contained 
at least one vector. This is im- 
portant, for otherwise the sub- 
sequent argument might be 
logically correct but meaning- 
less. 


Proof Let W be the intersection of the subspaces Wj, W 2 , . . . , W r . This set is not 
empty because each of these subspaces contains the zero vector of V, and hence so does 
their intersection. Thus, it remains to show that W is closed under addition and scalar 
multiplication. 

To prove closure under addition, let u and v be vectors in W. Since W is the inter- 
section of W\, W 2 , ... , W r , it follows that u and v also lie in each of these subspaces. 
Moreover, since these subspaces are closed under addition and scalar multiplication, they 
also all contain the vectors u + v and ku for every scalar k, and hence so does their inter- 
section W . This proves that W is closed under addition and scalar multiplication. 

Sometimes we will want to find the “smallest” subspace of a vector space V that con- 
tains all of the vectors in some set of interest. The following definition, which generalizes 
Definition 4 of Section 3.1, will help us to do that. 


If k = 1 , then Equation (2) has 
the form w = kiVj, in which 
case the linear combination is 
just a scalar multiple of Vi . 


DEFINITION 2 If w is a vector in a vector space V, then w is said to be a linear 
combination of the vectors Vi, v 2 , . . . , v r in V if w can be expressed in the form 

w = k l \ l + k 2 \ 2 H b k r \ r (2) 

where k\,k 2 , , k r are scalars. These scalars are called the coefficients of the linear 
combination. 


IfS — {wi , w 2 , . . . , \\’r } is a nonempty set of vectors in a vector space 

V, them. 

(a) The set W of all possible linear combinations of the vectors in S is a subspace ofV. 

(b) The set W in part (a) is the “ smallest ” subspace ofV that contains all of the vectors 
in S in the sense that any other subspace that contains those vectors contains W. 


Proof (a) Let W be the set of all possible linear combinations of the vectors in S. We 
must show that W is closed under addition and scalar multiplication. To prove closure 
under addition, let 

u = ciWi + C2W2 + • • • + c r w,. and v = Aqwi + £ 2 W2 + ■ ■ ■ + k, w r 
be two vectors in W. It follows that their sum can be written as 


u + V = (ci + &i)wi + (c 2 + k 2 )w 2 H b (c r + k r ) w, 
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which is a linear combination of the vectors in S. Thus, W is closed under addition. We 
leave it for you to prove that W is also closed under scalar multiplication and hence is a 
subspace of V. 

Proof (b) Let W' be any subspace of V that contains all of the vectors in S. Since W' 
is closed under addition and scalar multiplication, it contains all linear combinations of 
the vectors in S and hence contains W. 

The following definition gives some important notation and terminology related to 
Theorem 4.2.3. 


In the case where S is the 
empty set, it will be convenient 
to agree that span(0) = (0). 


DEFINITION 3 If S = {wi , W 2 , . . . , w r } is a nonempty set of vectors in a vector space 
V, then the subspace W of V that consists of all possible linear combinations of the 
vectors in S is called the subspace of V generated by S. and we say that the vectors 
Wi, W 2 , . . . , w,. span W. We denote this subspace as 

W — spanfwi, W 2 , . . . , w r ) or W = span(S) 


► EXAMPLE 11 The Standard Unit Vectors Span R n 

Recall that the standard unit vectors in R" are 

ei = (1,0,0, ...,0), e 2 = (0, 1,0, . . . , 0), e„ = (0, 0, 0, . . . , 1) 

These vectors span R” since every vector v = (i>i, . . . , v n ) in R " can be expressed as 

V = V\C\ + V2&2 + ■ • ■ + V n e n 

which is a linear combination of ei, e 2 , . . . , e„. Thus, for example, the vectors 
i= (1,0,0), j= (0,1,0), k= (0,0,1) 

span R 3 since every vector v = ( a,b,c ) in this space can be expressed as 

v = (a, b, c ) = a{ 1, 0, 0) + b( 0, 1, 0) + c(0, 0, 1) = ai + b\ + ck 

► EXAMPLE 12 A Geometric View of Spanning in R 2 and R 3 

(a) If v is a nonzero vector in R 2 or R 3 that has its initial point at the origin, then span{ v) , 
which is the set of all scalar multiples of v, is the line through the origin determined 
by v. You should be able to visualize this from Figure 4.2.6a by observing that the 
tip of the vector k\ can be made to fall at any point on the line by choosing the 
value of k to lengthen, shorten, or reverse the direction of v appropriately. 



George William Hill 
(1838-1914) 


Historical I The term linear combination is due to the American 
mathematician G.W. Hill, who introduced it in a research paper on plan- 
etary motion published in 1900. Hill was a "loner" who preferred to 
workout of his home in West Nyack, New York, rather than in academia, 
though he did try lecturing at Columbia University for a few years. In- 
terestingly, he apparently returned the teaching salary, indicating that 
he did not need the money and did not want to be bothered looking 
after it. Although technically a mathematician. Hill had little interest in 
modern developments of mathematics and worked almost entirely on 
the theory of planetary orbits. 

[Image: Courtesy of the American Mathematical Society 

www.ams.org] 
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(b) If V! and v 2 are nonzero vectors in R 3 that have their initial points at the origin, 
then spanjvi , v 2 }, which consists of all linear combinations of vi and v 2 , is the plane 
through the origin determined by these two vectors. You should be able to visualize 
this from Figure 4.2.6 b by observing that the tip of the vector k\V\ + k 2 \ 2 can be 
made to fall at any point in the plane by adjusting the scalars k\ and k 2 to lengthen, 
shorten, or reverse the directions of the vectors k\\i and k 2 \ 2 appropriately. 


► Figure 4.2.6 



EXAMPLE 13 A Spanning Set for P n 

The polynomials 1, x, x 2 , . . . , x” span the vector space P n defined in Example 10 since 
each polynomial p in P„ can be written as 

p = flo + a\x + ■ ■ ■ + a n x n 

which is a linear combination of 1, x, x 2 , ... ,x n . We can denote this by writing 

P„ = spanfl, x, x 2 , . . . , x"} M 

The next two examples are concerned with two important types of problems: 

Given a nonempty set S of vectors in R n and a vector v in R" , determine whether v is 
a linear combination of the vectors in S. 

Given a nonempty set S of vectors in R n , determine whether the vectors span R n . 


EXAMPLE 14 Linear Combinations 

Consider the vectors u = (1, 2, —1) and v = (6, 4, 2) in R : \ Show that w = (9, 2, 7) is 
a linear combination of u and v and that W = (4, — 1, 8) is not a linear combination of 
u and v. 

Solution In order for w to be a linear combination of u and v, there must be scalars k\ 
and k 2 such that w = Aqu + k 2 \; that is, 

(9, 2, 7) = ki(l, 2, -1) + k 2 ( 6, 4, 2) = (T, + 6 k 2 , 2 k x + 4 k 2 , ~k { + 2 k 2 ) 
Equating corresponding components gives 

k | ~F 6 k 2 = 9 
2k\ +Ak 2 = 2 
— k\ + 2k 2 = 7 

Solving this system using Gaussian elimination yields k\ = —3, k 2 = 2, so 


w = — 3u + 2v 
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Solution Spaces of 
Homogeneous Systems 


Similarly, for w' to be a linear combination of u and v, there must be scalars k\ and 
k 2 such that w' = Aqu + k 2 \; that is, 

(4, -1,8) = *i(l,2, — 1) + Jt 2 (6, 4, 2) = (ki+6k 2 ,2ki +4 k 2 , -k x + 2k 2 ) 
Equating corresponding components gives 

k\ + 6k 2 = 4 

2k i + 4k 2 = — 1 
— k\ + 2k 2 — 8 

This system of equations is inconsistent (verify), so no such scalars k\ and k 2 exist. 
Consequently, w' is not a linear combination of u and v. 


► EXAMPLE 15 Testing for Spanning 

Determine whether the vectors Vi = (1, 1, 2), y 2 = (1, 0, 1), and V 3 = (2, 1, 3) span the 
vector space R : \ 

Solution We must determine whether an arbitrary vector b = (b\ , b 2 , bf) in R‘ can be 
expressed as a linear combination 

b = ki\i + k 2 \ 2 + * 3 v 3 

of the vectors vi, \ 2 , and v 3 . Expressing this equation in terms of components gives 

(b u b 2 , b 3 ) = *i(l, 1, 2) + k 2 { 1, 0, 1) + * a (2, 1, 3) 


or 

ib \ , , b 2 i = {k\ T k 2 T 2 * 3 , k\ T * 3 , 2 k\ T k 2 T 3 * 3 ) 


or 

k\ T k 2 T 2*3 = b\ 
k\ + k 2 = b 2 
2 k\ T k 2 T 3*3 = b 2 

Thus, our problem reduces to ascertaining whether this system is consistent for all values 
of b\, b 2 , and b 3 . One way of doing this is to use parts ( e ) and (g) of Theorem 2.3.8, 
which state that the system is consistent if and only if its coefficient matrix 


A = 


1 1 
1 0 
2 1 


2 

1 

3 


has a nonzero determinant. But this is not the case here since det(A) = 0 (verify), so Vi, 
V 2 , and v 3 do not span R*. M 


The solutions of a homogeneous linear system Ax = 0 of m equations in n unknowns 
can be viewed as vectors in R" . The following theorem provides a useful insight into the 
geometric structure of the solution set. 


[EM 4.2.' The solution set of a homogeneous linear system Ax = 0 of m equa- 
tions in n unknowns is a subspace of R' 1 . 


Proof Let W be the solution set of the system. The set W is not empty because it 
contains at least the trivial solution x = 0. 
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To show that W is a subspace of R " , we must show that it is closed under addition 
and scalar multiplication. To do this, let xi and x 2 be vectors in W. Since these vectors 
are solutions of Ax = 0, we have 

Axi = 0 and Ax 2 = 0 

It follows from these equations and the distributive property of matrix multiplication 
that 

A(xi + x 2 ) = Axi + Ax 2 = 0 + 0 = 0 
so W is closed under addition. Similarly, if k is any scalar then 

A(kx i) = kAx i = &0 = 0 
so W is also closed under scalar multiplication. 

Because the solution set of a homogeneous system in n unknowns is actually a 
subspace of R n , we will generally refer to it as the solution space of the system. 


I EXAMPLE 16 Solution Spaces of Homogeneous Systems 

In each part, solve the system by any method and then give a geometric description of 
the solution set. 



1 -2 3 


X 


0 

(a) 

2-4 6 


y 

= 

0 


3-6 9 


z 


0 



1 -2 3~ 


X 


" 0 " 

(c) 

-3 7 -8 


y 

— 

0 


4 1 2 


z 


0 



1-2 3~ 


X 


"o" 

(b) 

-3 7 -8 


y 

= 

0 


1 

1 

K> 

44- 

1 

ON 

1 


z 


0 



O 

O 

| o 


X 


'0 

(d) 

0 0 0 

0 0 0 


y 

_z_ 

= 

0 

_0_ 


Solution 

(a) The solutions are 

x = 2s — 3f, y = s, z = t 

from which it follows that 

x = 2y — 3z or x — 2y + 3z = 0 


This is the equation of a plane through the origin that has n = (1, —2, 3) as a 
normal. 

(b) The solutions are 

x = —5t, y — —t, z = t 

which are parametric equations for the line through the origin that is parallel to the 
vector v = (—5, —1, 1). 

(c) The only solution is x = 0, y = 0, z = 0, so the solution space consists of the single 
point { 0 }. 

(d) This linear system is satisfied by all real values of x, y, and z, so the solution space 
is all of R 3 . 


Remark Whereas the solution set of every homogeneous system of m equations in n unknowns is 
a subspace of R " , it is never true that the solution set of a nonhomogeneous system of m equations 
in n unknowns is a subspace of R" . There are two possible scenarios: first, the system may not 
have any solutions at all, and second, if there are solutions, then the solution set will not be closed 
either under addition or under scalar multiplication (Exercise 18). 
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The Li near Transformation Theorem 4.2.4 can be viewed as a statement about matrix transformations by letting 
Viewpoint Ta'- R" -*R m be multiplication by the coefficient matrix A. From this point of view 
the solution space of Ax = 0 is the set of vectors in R" that Ta maps into the zero 
vector in R'" . This set is sometimes called the kernel of the transformation, so with this 
terminology Theorem 4.2.4 can be rephrased as follows. 


!EM 4 . 2.5 If A is an m x n matrix, then the kernel of the matrix transformation 
Ta - R" -> R"‘ is a subspace of R n . 


A Concluding Observation It is important to recognize that spanning sets are not unique. For example, any nonzero 

vector on the line in Figure 4.2.6a will span that line, and any two noncollinear vectors 
in the plane in Figure 4.2.66 will span that plane. The following theorem, whose proof 
is left as an exercise, states conditions under which two sets of vectors will span the same 
space. 


IfS = {vi, V2, . . . , v r } and S' = {wi , W2, . . . , w*} are nonempty sets 
of vectors in a vector space V, then 

span{\\, \2, ■ . . , v,.} = span{ wi, W2, . . . , w^} 

if and only if each vector in S is a linear combination of those in S', and each vector in 
S' is a linear combination of those in S. 


Exercise Set 4.2 

1. Use Theorem 4.2.1 to determine which of the following are 
subspaces of R } . 

(a) All vectors of the form ( a , 0, 0). 

(b) All vectors of the form (a, 1, 1). 

(c) All vectors of the form (a, b, c), where b = a + c. 

(d) All vectors of the form (a, b,c), where b = a + c + 1 . 

(e) All vectors of the form (a, b,0). 

2. Use Theorem 4.2.1 to determine which of the following are 
subspaces of M lul . 

(a) The set of all diagonal n x n matrices. 

(b) The set of all n x n matrices A such that det(A) = 0. 

(c) The set of all n x n matrices A such that tr(A) = 0. 

(d) The set of all symmetric n x n matrices. 

(e) The set of all n x n matrices A such that A T = —A. 

(f ) The set of all n x n matrices A for which Ax = 0 has only 

the trivial solution. 

(g) The set of all n x n matrices A such that AB = BA for 
some fixed n x n matrix B . 


3. Use Theorem 4.2.1 to determine which of the following are 

subspaces of P 3 . 

(a) All polynomials a 0 + a 3 x + a 2 x 2 + a 3 * 2 for which 
ao — 0. 

(b) All polynomials a 0 + a 3 x + a 2 x 2 + a 3 x 3 for which 
flo T Ui T $2 T a 3 — 0. 

(c) All polynomials of the form a 0 + a\X + a 2 x 2 + a 3 x 3 in 
which ao,ai,a 2 , and a 3 are rational numbers. 

(d) All polynomials of the form a 0 + a t x, where a 0 and a t are 
real numbers. 

4. Which of the following are subspaces of F(— 00 , 00 )? 

(a) All functions / in F(— 00 , 00 ) for which /( 0) = 0. 

(b) All functions / in F(— 00 , 00 ) for which /( 0) = 1. 

(c) All functions / in F(— 00 , co) for which f(—x) = f{x). 

(d) All polynomials of degree 2. 

5. Which of the following are subspaces of R°°l 

fa) All sequences v in R“ of the form 
v = (v, 0, v, 0, v, 0, . . .). 
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(b) All sequences v in R“ of the form 
v = (u, 1, v, 1, v, 1, . . .). 

(c) All sequences v in R“ of the form 
v = (t>, 2v, 4v, 8v, 16v, . . .). 

(d) All sequences in R°° whose components are 0 from some 
point on. 

6. A line L through the origin in R 3 can be represented by para- 
metric equations of the form x = at, y = bt, and z = ct. Use 
these equations to show that L is a subspace of R 3 by showing 
that if vi = (jci, yi, zi) and v 2 = (* 2 , yi, Zi) are points on L 
and k is any real number, then kv \ and Vi + v 2 are also points 
on L. 

7. Which of the following are linear combinations of 
u= (0,-2, 2) and v = (1,3, -1)? 

(a) (2, 2, 2) (b) (0, 4, 5) (c) (0, 0, 0) 

8. Express the following as linear combinations of u = (2, 1,4), 
v= (1, — 1, 3), and w = (3,2, 5). 

(a) (-9, -7, -15) (b) (6, 11, 6) (c) (0, 0, 0) 


9. Which of the following are linear combinations of 




" 4 

O' 



"1 

-1" 


"0 

2" 


A = 

-2 

— 2_ 

. B 

— 

2 

3_ 

, c = 

_1 

4_ 


" 6 

-8" 



"0 

0" 



'-1 

5 

(a) 

-1 

-8 


(b) 

0 

0 


(c) 

7 

1 


10. In each part express the vector as a linear combination of 
P! = 2 + x + 4x 2 , p 2 = 1 — x + 3x 2 , and 

p 3 = 3 + 2x + 5* 2 . 

(a) —9 — lx — 15.r 2 (b) 6 + 1 lx + 6x 2 

(c) 0 (d) 7 + 8* + 9* 2 

11. In each part, determine whether the vectors span R 3 . 

(a) v, = (2, 2, 2), v 2 = (0, 0, 3), v 3 = (0, 1, 1) 

(b) v, = (2,-1, 3), v 2 = (4, 1, 2), v 3 = (8, -1, 8) 

12. Suppose that V! = (2, 1, 0, 3), v 2 = (3, — 1 , 5, 2), and 

v 3 = (—1,0,2, 1). Which of the following vectors are in 
spanjvi, v 2 , v 3 }? 

(a) (2, 3, -7, 3) (b) (0, 0, 0, 0) 

(c) (1,1, 1,1) (d) (-4, 6, -13,4) 


origin only. If it is a plane, find an equation for it. If it is a 
line, find parametric equations for it. 


(a) A = 


-1 

3 

2 


1 

-1 

-4 


1 

0 

-5 


(b) A = 


1 

2 

1 


2 

5 

0 


3 

3 

8 


(c) A = 


1 

2 

3 


-3 

-6 

-9 


1 

2 

3 


(d) A = 


1 

2 

3 


-1 

-1 

1 


1 

4 

11 


16. ( Calculus required) Show that the following sets of functions 
are subspaces of F{— oo, co). 

(a) All continuous functions on (— oo, oo). 

(b) All differentiable functions on (— oo, co). 

(c) All differentiable functions on (— co, oo) that satisfy 

f +2f = 0. 


17. ( Calculus required) Show that the set of continuous functions 
f = fix) on [a, b] such that 

fix) dx = 0 

is a subspace of C[a, b]. 



18. Show that the solution vectors of a consistent nonhomoge- 
neous system of m linear equations in n unknowns do not 
form a subspace of R" . 

19. In each part, let T A : R 2 — > R 2 be multiplication by A, and 
let ui = (1, 2) and u 2 = (—1, 1). Determine whether the set 
{7a(Ui), T a (u 2 )1 spans R 2 . 


(a) A = 


1 

0 


-1 

2 


(b) A = 


-1 

2 


20. In each part, let T A : R 3 — > R 2 be multiplication by A, and let 
iq = (0. 1, 1) and u 2 = (2, —1, 1) and u 3 = (1, 1, —2). De- 
termine whether the set (^(“1), 7a(u 2 ), T a (u 3 )) spans R 2 . 


(a) A = 


1 

0 


1 0 
1 -1 


(b) A = 


1 0 
1 -3 


21. If T a is multiplication by a matrix A with three columns, then 
the kernel of T A is one of four possible geometric objects. What 
are they? Explain how you reached your conclusion. 


13. Determine whether the following polynomials span P 2 . 

P! = 1 — x + 2x 2 , p 2 = 3 + x, 

p 3 = 5 — x + 4x 2 , p 4 = —2 — 2x + 2* 2 

14. Let f = cos 2 * and g = sin 2 *. Which of the following lie in 
the space spanned by f and g? 

(a) cos 2* (b) 3 + * 2 (c) 1 (d) sin* (e) 0 

15. Determine whether the solution space of the system Ax = 0 
is a line through the origin, a plane through the origin, or the 


22. Let vi = (1,6,4), v 2 = (2, 4, — 1), v 3 = (-1,2,5), and 
w 3 = (1, —2, — 5), w 2 = (0. 8, 9). Use Theorem 4.2.6 to show 
that spanjv!, v 2 , v 3 ) = spanfwq, w 2 ). 

23. The accompanying figure shows a mass-spring system in which 
a block of mass m is set into vibratory motion by pulling the 
block beyond its natural position at * = 0 and releasing it at 
time t = 0. If friction and air resistance are ignored, then the 
* -coordinate *(t) of the block at time t is given by a function 
of the form 

x(t) — ci cos iDt + c 2 sin cut 
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where to is a fixed constant that depends on the mass of the 
block and the stiffness of the spring and c 3 and c 2 are arbi- 
trary. Show that this set of functions forms a subspace of 


C°°(— 00, 00). 


Natural position 


h/vwvwwww^r^l! 

X 

0 

Stretched 


Kaaaaaaaaaaaa/^^ 

\ * 

0 

Released 


(wvwvwwiwi/^ m 1 ■* >- 

X 


T 

0 


A Figure Ex-23 

Working with Proofs 

24. Prove Theorem 4.2.6. 

True-False Exercises 

TF. In parts (a)-(k) determine whether the statement is true or 

false, and justify your answer. 

(a) Every subspace of a vector space is itself a vector space. 

(b) Every vector space is a subspace of itself. 

(c) Every subset of a vector space V that contains the zero vector 
in V is a subspace of V. 

(d) The kernel of a matrix transformation T A : R n — »■ R m is a sub- 
space of R m . 

(e) The solution set of a consistent linear system Ax = b of m 
equations in n unknowns is a subspace of R" . 

(f ) The span of any finite set of vectors in a vector space is closed 
under addition and scalar multiplication. 


(g) The intersection of any two subspaces of a vector space V is a 
subspace of V. 

(h) The union of any two subspaces of a vector space V is a sub- 
space of V. 

(i) Two subsets of a vector space V that span the same subspace 
of V must be equal. 

( j) The set of upper triangular n x n matrices is a subspace of the 
vector space of all n x n matrices. 

(k) The polynomials x — 1, (x — l) 2 , and (x — l) 3 span P 3 . 

Working with Technology 

Tl. Recall from Theorem 1.3.1 that a product Ax can be expressed 
as a linear combination of the column vectors of the matrix A in 
which the coefficients are the entries of x. Use matrix multiplica- 
tion to compute 

v = 6(8, -2, 1, -4) + 17( — 3, 9, 11,6) - 9(13, -1,2,4) 

T2. Use the idea in Exercise Tl and matrix multiplication to de- 
termine whether the polynomial 

p = 1 + x T x' + x 

is in the span of 

P! = 8 — 2x + x 2 — 4x 3 , p 2 = — 3 + 9x + 1 lx 2 + 6x 3 , 
p 3 = 13 — x + 2x 2 + 4x 3 

T3. For the vectors that follow, determine whether 
spanjv!, v 2 , v 3 ) = span(w!, w 2 , w 3 ) 

V! = (-1,2,0, 1,3), v 2 = (7,4. 6, -3, 1), 
v 3 = (-5,3, 1,2,4) 

W! = (-6, 5, 1, 3, 7), w 2 = (6, 6, 6. -2, 4), 
w 3 = (2, 7, 7, -1,5) 


4.3 Linear Independence 

In this section we will consider the question of whether the vectors in a given set are 
interrelated in the sense that one or more of them can be expressed as a linear combination 
of the others. This is important to know in applications because the existence of such 
relationships often signals that some kind of complication is likely to occur. 

Linear Independence and In a rectangular xy-coordinate system every vector in the plane can be expressed in 
Dependence exactly one way as a linear combination of the standard unit vectors. For example, the 
only way to express the vector (3, 2) as a linear combination of i = (1,0) and j = (0, 1) 
is 


(3,2) = 3(1,0) + 2(0, 1) = 3i+2j 


( 1 ) 
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▲ Figure 4.3.1 



▲ Figure 4.3.2 


In the case where the set S in 
Definition 1 has only one vec- 
tor, we will agree that S is lin- 
early independent if and only 
if that vector is nonzero. 


(Figure 4.3.1). Suppose, however, that we were to introduce a third coordinate axis that 
makes an angle of 45° with the x-axis. Call it the ut-axis. As illustrated in Figure 4.3.2, 
the unit vector along the ut-axis is 

" = (7I'7l) 

Whereas Formula (1) shows the only way to express the vector (3, 2) as a linear combina- 
tion of i and j , there are infinitely many ways to express this vector as a linear combination 
of i, j, and w. Three possibilities are 

= 3i + 2j + Ow 
) = 3i + j + V2w 
^ = 4i + 3j — \/2w 

In short, by introducing a superfluous axis we created the complication of having mul- 
tiple ways of assigning coordinates to points in the plane. What makes the vector w 
superfluous is the fact that it can be expressed as a linear combination of the vectors i 
and j, namely, 

W_ \V2’ V2 ) ~ V2 1+ V2 J 

This leads to the following definition. 


<3,2> = 3(>,0> + 2<0,.) + o(-L,-L) 
(3,2) = 2(1,0) + (0,1) + V2^,i= 
(3, 2) = 4(1,0) + 3(0,1) ->/2 



DEFINITION 1 If S = {vi , v 2 , . . . , v r J is a set of two or more vectors in a vector space 
V, then S is said to be a linearly independent set if no vector in S can be expressed as 
a linear combination of the others. A set that is not linearly independent is said to be 
linearly dependent. 


In general, the most efficient way to determine whether a set is linearly independent 
or not is to use the following theorem whose proof is given at the end of this section. 


+ nonempty set S = {vi, v 2 , . . . , v r } in a vector space V is linearly 
independent if and only if the only coefficients satisfying the vector equation 

k\\\ + k 2 \ 2 H h k r \ r = 0 

are k\ = 0, k 2 = 0, . . . , k r = 0. 


► EXAMPLE 1 Linear Independence of the Standard Unit Vectors in R n 

The most basic linearly independent set in R" is the set of standard unit vectors 

ej = (1,0, 0, ...,0), e 2 = (0, 1,0, ...,0), .... e„ = (0, 0, 0, . . . , 1) 
To illustrate this in I? 3 , consider the standard unit vectors 


i= (1,0,0), j= (0,1,0), k= (0,0,1) 
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To prove linear independence we must show that the only coefficients satisfying the vector 
equation 

k\i + k.2\ + k 3 k — 0 

are k\ = 0, k 2 = 0, k 3 = 0. But this becomes evident by writing this equation in its 
component form 

(ki,k 2 ,k 3 ) = (0,0,0) 

You should have no trouble adapting this argument to establish the linear independence 
of the standard unit vectors in R n . 


► EXAMPLE 2 Linear Independence in R 3 

Determine whether the vectors 


v 1 = (l,-2,3), \2 = (5, 6, — 1), v 3 = (3,2,1) (2) 

are linearly independent or linearly dependent in R 3 . 

Solution The linear independence or dependence of these vectors is determined by 
whether the vector equation 

k\y i + k 2 y 2 + k 3 v 3 = 0 (3) 

can be satisfied with coefficients that are not all zero. To see whether this is so, let us 
rewrite (3) in the component form 


k j(l, -2, 3) + k 2 ( 5, 6, -1) + k 3 (3, 2, 1) = (0, 0, 0) 


Equating corresponding components on the two sides yields the homogeneous linear 
system 

k\ T 5k 2 + 3 k 2 = 0 

- 2k i + 6 k 2 + 2 k 3 = 0 (4) 

3k\ — ^ f k 3 = 0 

Thus, our problem reduces to determining whether this system has nontrivial solutions. 
There are various ways to do this; one possibility is to simply solve the system, which 
yields 

ki = —jt, k 2 = -\t , k 3 = t 

(we omit the details). This shows that the system has nontrivial solutions and hence 
that the vectors are linearly dependent. A second method for establishing the linear 
dependence is to take advantage of the fact that the coefficient matrix 


A = 



5 

6 

-1 


3" 

2 

1 


is square and compute its determinant. We leave it for you to show that det(A) = 0 from 
which it follows that (4) has nontrivial solutions by parts (b) and (g) of Theorem 2.3.B. 

Because we have established that the vectors Vi, v 2 , and v 3 in (2) are linearly depen- 
dent, we know that at least one of them is a linear combination of the others. We leave 
it for you to confirm, for example, that 


Ts - 5V1 + \y 2 


► EXAMPLE 3 Linear Independence in /7 4 

Determine whether the vectors 
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V! = (1,2,2, -1), v 2 = (4, 9, 9, -4), v 3 = (5, 8, 9, -5) 
in R 4 are linearly dependent or linearly independent. 

Solution The linear independence or linear dependence of these vectors is determined 
by whether there exist nontrivial solutions of the vector equation 

*lVi + * 2 V 2 + &3V3 = 0 

or, equivalently, of 

*ri(l, 2, 2, -1) + * 2 (4, 9, 9, -4) + *3(5, 8, 9, -5) = (0, 0, 0, 0) 

Equating corresponding components on the two sides yields the homogeneous linear 
system 

*1 + 4*, + 5*3 = 0 
2*i 4~ 9*2 T 8*3 = 0 
2*i + 9*2 + 9*3 = 0 
— *1 — 4*2 — 5*3 = 0 

We leave it for you to show that this system has only the trivial solution 

*1 =0, *2 = 0, *3 = 0 

from which you can conclude that Vi, V2, and V3 are linearly independent. 


I EXAMPLE 4 An Important Linearly Independent Set in P n 

Show that the polynomials 

1, x, x 2 ,..., x" 
form a linearly independent set in P„ . 

Solution For convenience, let us denote the polynomials as 

Po = !> Pi=*> p 2 =x 2 , ..., P„ = x” 

We must show that the only coefficients satisfying the vector equation 

floPo + fl lPl + a 2 P2 + • • • + ClnVn = 0 (5) 


are 


«o = a\ — a 2 = • • • = a n — 0 


But (5) is equivalent to the statement that 


Qq 4 ~ U\X 4 “ Q 2 %~ 4 “ ■ ■ * 4 “ ci n x n — 0 


( 6 ) 


for all x in (—00, 00), so we must show that this is true if and only if each coefficient in 
(6) is zero. To see that this is so, recall from algebra that a nonzero polynomial of degree 
n has at most n distinct roots. That being the case, each coefficient in (6) must be zero, 
for otherwise the left side of the equation would be a nonzero polynomial with infinitely 
many roots. Thus, (5) has only the trivial solution. 


The following example shows that the problem of determining whether a given set of 
vectors in P„ is linearly independent or linearly dependent can be reduced to determining 
whether a certain set of vectors in R n is linearly dependent or independent. 
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► EXAMPLE 5 Linear Independence of Polynomials 

Determine whether the polynomials 

Pi = 1 — x, p 2 = 5 + 3x — 2x 2 , p 3 = 1 + 3x — x 1 
are linearly dependent or linearly independent in P 2 . 


In Example 5, what rela- 
tionship do you see between 
the coefficients of the given 
polynomials and the column 
vectors of the coefficient ma- 
trix of system (9)? 


Solution The linear independence or dependence of these vectors is determined by 
whether the vector equation 

A-iPi + k 2 p 2 + & 3 p 3 = 0 (7) 

can be satisfied with coefficients that are not all zero. To see whether this is so, let us 
rewrite (7) in its polynomial form 

&i(l — x) + k 2 { 5 + 3x — 2x 2 ) + k 2 ( 1 + 3x — x 2 ) = 0 (8) 

or, equivalently, as 

( k\ -f 5& 2 T k 3 ) T ( — k\ -f- 3k 2 T 3 kfix T ( — 2 k 2 — kf)x^ = 0 
Since this equation must be satisfied by all x in (— oo, co), each coefficient must be zero 
(as explained in the previous example). Thus, the linear dependence or independence 
of the given polynomials hinges on whether the following linear system has a nontrivial 
solution: 

k\ -f- 5k 2 T k 2 = 0 

— k\ -f- 3 k 2 T 3k} = 0 (9) 

— 2 k 2 — ki = 0 

We leave it for you to show that this linear system has nontrivial solutions either by 
solving it directly or by showing that the coefficient matrix has determinant zero. Thus, 
the set {p 3 , p 2 , p 3 } is linearly dependent. M 


Sets with One or Two 
Vectors 


The following useful theorem is concerned with the linear independence and linear de- 
pendence of sets with one or two vectors and sets that contain the zero vector. 


THEOREM 4.3.2 

(a) A finite set that contains 0 is linearly dependent. 

(b) A set with exactly one vector is linearly independent if and only if that vector is 
not 0. 

(c) A set with exactly two vectors is linearly independent if and only if neither vector 
is a scalar multiple of the other. 


We will prove part (a) and leave the rest as exercises. 

Proof (a) For any vectors v-| , v 2 , .... v r , the set S = { v i , v 2 , . . . , v,- , 0) is linearly depen- 
dent since the equation 

Ovj + 0v 2 + • • • + 0v,. + 1(0) = 0 

expresses 0 as a linear combination of the vectors in S with coefficients that are not 
all zero. 

EXAMPLE 6 Linear Independence of Two Functions 

The functions fj = x and f 2 = sinx are linearly independent vectors in F(— oo, oo) since 
neither function is a scalar multiple of the other. On the other hand, the two functions 
g[ = sin 2x and g 2 = sin x cos x are linearly dependent because the trigonometric iden- 
tity sin 2x = 2 sin x cos x reveals that g] and g 2 are scalar multiples of each other. < 
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A Geometric Interpretation 
of Linear Independence 


► Figure 4.3.3 


► Figure 4.3.4 


Linear independence has the following useful geometric interpretations in R 2 and R 3 : 

Two vectors in R 2 or R 3 are linearly independent if and only if they do not lie on the 
same line when they have their initial points at the origin. Otherwise one would be a 
scalar multiple of the other (Figure 4.3.3). 



Three vectors in R 3 are linearly independent if and only if they do not lie in the same 
plane when they have their initial points at the origin. Otherwise at least one would 
be a linear combination of the other two (Figure 4.3.4). 



(a) Linearly dependent 


(b) Linearly dependent (c) Linearly independent 


At the beginning of this section we observed that a third coordinate axis in R 2 is 
superfluous by showing that a unit vector along such an axis would have to be expressible 
as a linear combination of unit vectors along the positive x- and y-axis. That result is 
a consequence of the next theorem, which shows that there can be at most n vectors in 
any linearly independent set R n . 


THEOREM 4.3.3 Let S = {vi , y 2 , . . . , v r } be a set of vectors in R n . If r > n, then S is 
linearly dependent. 

Proof Suppose that 


(i’ll, 

Vl2, ■ 

■ • i ^ltt) 

(^21, 

^22, • 

■ • ? ^2 «) 

(V rU 

IV2, ■ 

■ • , v rn ) 


and consider the equation 

ki\i + k 2 \2 H F k r \ r = 0 
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It follows from Theorem 4.3.3 
that a set in R 2 with more than 
two vectors is linearly depen- 
dent and a set in R 3 with more 
than three vectors is linearly 
dependent. 


If we express both sides of this equation in terms of components and then equate the 
corresponding components, we obtain the system 

V\\k\ + v 2 \k 2 + ■ • • + v r \k r — 0 
Vnk\ + v 22 k 2 H b v r2 k r — 0 


vinki + v 2n k 2 + • • • + v rn k r — 0 


This is a homogeneous system of n equations in the r unknowns k\, ... ,k r . Since 
r > n, it follows from Theorem 1 .2.2 that the system has nontrivial solutions. Therefore, 
S = {vi , V 2 , . . . , v r } is a linearly dependent set. 


CALCULUS REQUIRED 

Linear Independence of 
Functions 


Sometimes linear dependence of functions can be deduced from known identities. For 
example, the functions 

f 1 = sin 2 x, f 2 = cos 2 x , and f 3 = 5 


form a linearly dependent set in F{— oo, oo), since the equation 
5fi + 5f2 — f 3 = 5 sin 2 x + 5 cos 2 x — 5 

= 5(sin 2 x + cos 2 x) — 5 = 0 

expresses 0 as a linear combination of fi, f 2 , and f 3 with coefficients that are not all zero. 

However, it is relatively rare that linear independence or dependence of functions can 
be ascertained by algebraic or trigonometric methods. To make matters worse, there is 
no general method for doing that either. That said, there does exist a theorem that can 
be useful for that purpose in certain cases. The following definition is needed for that 
theorem. 


DEFINITION 2 If f| = /i(x), f 2 = f 2 (x), . . . , f„ = f n (x) are functions that are 
n — 1 times differentiable on the interval (— oo, oo), then the determinant 



f\(x) 

fl(x) 

/ 2 O) 

f 2 M 

• fn(x) 

■ fn(x) 

W(x) = 


An- 1) 
/ 1 

0) f 2 l ~ l) (x) • 

■ rt n ~ X) {x) 

is called the Wronskian of f \ , f 2 , 





Jozef Hoene de Wronski 


(1778-1853) 


The Polish-French mathematician Jozef Hoene de 
Wronski was born Jozef Hoene and adopted the name Wronski after 
he married. Wronski's life was fraught with controversy and conflict, 
which some say was due to psychopathic tendencies and his exag- 
geration of the importance of his own work. Although Wronski's work 
was dismissed as rubbish for many years, and much of it was indeed 
erroneous, some of his ideas contained hidden brilliance and have sur- 
vived. Among other things, Wronski designed a caterpillar vehicle to 
compete with trains (though it was never manufactured) and did re- 
search on the famous problem of determining the longitude of a ship 
at sea. His final years were spent in poverty. 

[Image: © TopFoto/The Image Works] 
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Suppose for the moment that fi = f\(x), f 2 = f 2 (x), ... , f„ = /„(x ) are linearly 
dependent vectors in C ( " _1) (— oo, oo). This implies that the vector equation 

k\I\ + k 2 f 2 + • • • + k n f n = 0 

is satisfied by values of the coefficients k\,k 2 , ... , k n that are not all zero, and for these 
coefficients the equation 

k\f\(x) + k 2 f 2 (x) H b k n f t (x ) = 0 

is satisfied for all x in (— oo, oo). Using this equation together with those that result by 
differentiating it n — 1 times we obtain the linear system 


k\f\(x) 

+ k 2 f 2 {x) 


’ “1” kfi fn CO 

= 0 

hf[{x) 

+ k 2 f{(x) 


■ •+ k n f^{x) 

= 0 

hti n ~ X) 

(x) + £ 2 / 2 ( " _1, 

(•*)+•' 

■■+k n ff 1 ~ 1 \x) 

= 0 


Thus, the linear dependence of f| , f 2 , . . . , f„ implies that the linear system 


f\(x) / 2 (x) 

flix) f{(x) 

• ^ 

1 


1 

1 


1 

O O • 

1 

_fl n - l \x) ft l \x) • 

• fl n ~ X \x)_ 


1 

??• . . 

S 

1 


1 

• • O 
1 


has a nontrivial solution for every x in the interval (— oo, oo), and this in turn implies 
that the determinant of the coefficient matrix of ( 10 ) is zero for every such x. Since this 
determinant is the Wronskian of f \ , f 2 , we have established the following result. 


WARNING The converse of 
Theorem 4.3.4 is false. If the 
Wronskian of f), f 2 , . . . , f„ is 
identically zero on (— oo, oo), 
then no conclusion can be 
reached about the linear inde- 
pendence of (fi, f 2 , . . . , f„) — 
this set of vectors may be lin- 
early independent or linearly 
dependent. 


!EM 4.3.4 If the functions f i , f 2 , . . . , f„ have n— 1 continuous derivatives 
on the interval (—00,00), and if the Wronskian of these functions is not identically 
zero on (—00, 00), then these functions form a linearly independent set of vectors in 
£(»-!) (_oo, 00). 


In Example 6 we showed that * and sinx are linearly independent functions by 
observing that neither is a scalar multiple of the other. The following example illustrates 
how to obtain the same result using the Wronskian (though it is a more complicated 
procedure in this particular case). 


► EXAMPLE 7 Linear Independence Using the Wronskian 

Use the Wronskian to show that fi = x and f 2 = sinx are linearly independent vectors 
in C°°(— 00, 00). 


Solution The Wronskian is 

W{x) = . =xcosx — sinx 

This function is not identically zero on the interval (—00, 00) since, for example, 


x smx 
1 cosx 


/ 7T \ 71 (71 

■\ . in 

: \ 71 

— = — cos ( - 

- ) — sin ( — 

- 1 — — 

V 2 / 2 V 2 


\) 2 


Thus, the functions are linearly independent. 
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► EXAMPLE 8 Linear Independence Using the Wronskian 

Use the Wronskian to show that fi = 1, f 2 = e x , and f 3 = e 2x are linearly independent 
vectors in C°°(— oo, oo). 


Solution The Wronskian is 


W(x) = 


1 e x e 2x 

0 e* 2e lx 

0 e* 4e lx 


= 2e 


3 a 


This function is obviously not identically zero on (— oo, oo), so fi , f 2 , and f 3 form a linearly 
independent set. 


o pt i o n a l We will close this section by proving Theorem 4.3.1. 

Proof of Theorem 4.3. 1 We will prove this theorem in the case where the set S has two 
or more vectors, and leave the case where S has only one vector as an exercise. Assume 
first that S is linearly independent. We will show that if the equation 

k\y\ + k 2 \ 2 H b k r \ r = 0 (11) 

can be satisfied with coefficients that are not all zero, then at least one of the vectors in 
S must be expressible as a linear combination of the others, thereby contradicting the 
assumption of linear independence. To be specific, suppose that k\ ^ 0. Then we can 
rewrite (11) as 



which expresses vi as a linear combination of the other vectors in S. 

Conversely, we must show that if the only coefficients satisfying (11) are 

k\ = 0, k 2 = 0, . . k r — 0 

then the vectors in S must be linearly independent. But if this were true of the coeffi- 
cients and the vectors were not linearly independent, then at least one of them would be 
expressible as a linear combination of the others, say 

vi = c 2 v 2 H b c r y r 

which we can rewrite as 


vi + (-c 2 )v 2 H b (— c r )v r = 0 

But this contradicts our assumption that (11) can only be satisfied by coefficients that 
are all zero. Thus, the vectors in S must be linearly independent. 


Exercise Set 4.3 


1. Explain why the following form linearly dependent sets of vec- 
tors. (Solve this problem by inspection.) 

(a) Ul = (-1, 2, 4) and u 2 = (5, -10, -20) in R 3 

(b) Ul = (3,-1), u 2 = (4, 5), u 3 = (-4, 7) in R 2 


(c) = 3 — 2x + x 2 and p 2 = 6 — 4x + 2x 2 in P 2 


-3 4 
2 0 


and B = 


3 -4 

-2 0 


in Mr 


2. In each part, determine whether the vectors are linearly inde- 
pendent or are linearly dependent in R 3 . 

(a) (-3,0,4), (5, -1,2), (1, 1,3) 

(b) (-2, 0, 1), (3, 2, 5), (6, -1, 1), (7, 0, -2) 

3. In each part, determine whether the vectors are linearly inde- 
pendent or are linearly dependent in R 4 . 

(a) (3, 8, 7, -3), (1,5, 3,-1), (2, -1,2,6), (4,2, 6,4) 

(b) (3, 0, -3, 6), (0, 2, 3, 1), (0, -2, -2, 0), (-2, 1,2, 1) 


(d) A = 
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4. In each part, determine whether the vectors are linearly inde- 
pendent or are linearly dependent in P 2 . 

(a) 2 — x + 4x 2 , 3 + 6 x + 2x 2 , 2 + lOx — 4x 2 

(b) 1 + 3x + 3x 2 , x + 4x 2 , 5 + 6x + 3x 2 , 7 + 2x — x 2 

5. In each part, determine whether the matrices are linearly in- 
dependent or dependent. 


(a) 

’l 

1 

o’ 


"l 

2 

1 


’o 

l" 

1 

in M 22 

2 


2 


2 

(b) 

’l 

0 

o’ 


"o 

0 

f 


1 

o 

o 

o 

1 

0 

0 

0 

’ 

0 

0 

0 

’ 

0 1 0 


6. Determine all values of k for which the following matrices are 
linearly independent in M 22 . 


1 

o 

1 


1 

1 

o 

1 


’2 0 " 

1 k 

’ 

k 1 

’ 

1 3 


7. In each part, determine whether the three vectors lie in a plane 
in R 3 . 

(a) v, = (2, -2, 0), v 2 = (6, 1,4), v 3 = (2, 0, -4) 

(b) v, = (-6, 7, 2), v 2 = (3, 2, 4), v 3 = (4, -1, 2) 

8. In each part, determine whether the three vectors lie on the 
same line in R 3 . 

(a) v, = (-1, 2, 3), v 2 = (2, -4, -6), v 3 = (-3, 6, 0) 

(b) v, = (2, -1, 4), v 2 = (4, 2, 3), v 3 = (2, 7, -6) 

(c) v, = (4, 6, 8), v 2 = (2, 3, 4), v 3 = (-2, -3, -4) 

9. (a) Show that the three vectors V! = (0, 3, 1, —1), 

v 2 = (6, 0, 5, 1), and v 3 = (4, —7, 1, 3) form a linearly 
dependent set in R 4 . 

(b) Express each vector in part (a) as a linear combination of 
the other two. 


10. (a) Show that the vectors v 3 = (1, 2, 3, 4), v 2 = (0, 1. 0, —1), 
and v 3 = (1. 3, 3, 3) form a linearly dependent set in R 4 . 

(b) Express each vector in part (a) as a linear combination of 
the other two. 


11. For which real values of A do the following vectors form a 
linearly dependent set in R 3 2 

Vl = (A.-I,-I), v 2 =(-I,A,-I), v 3 = (-1,-I,A) 

12. Under what conditions is a set with one vector linearly inde- 
pendent? 

13. In each part, let T A : R 2 R 2 be multiplication by A, and 
let ui = (1, 2) and u 2 = (—1, 1). Determine whether the set 
{Ta(ui), r A (u 2 )) is linearly independent in R 2 . 


(a) A = 


1 

0 


-1 

2 


(b) A = 


-1 

2 


14. In each part, let T A : R 3 — > R 3 be multiplication by A, and let 
Ui = (1, 0, 0), u 2 = (2, —1, 1), andu 3 = (0, 1, 1). Determine 


whether the set (J A (ui), T^Iui), 7 a( u 3)} is linearly indepen- 
dent in R 3 . 



'l 

1 

2 


'l 

1 

f 

(a) A = 

1 

0 

-3 

(b) A = 

1 

1 

-3 


2 

2 

0 


2 

2 

0 


15. Are the vectors Vi, v 2 , and v 3 in part (a) of the accompany- 
ing figure linearly independent? What about those in part (b)2 
Explain. 




16. By using appropriate identities, where required, determine 
which of the following sets of vectors in F(— oo, oo) are lin- 
early dependent. 

(a) 6, 3sin 2 x, 2cos 2 x (b) x, cosx 

(c) 1, sinx, sin2x (d) cos2x, sin 2 x, cos 2 x 

(e) (3 — x) 2 , x 2 — 6x, 5 (f) 0, cos 3 jrx, sin 5 37rx 

17. ( Calculus required) The functions 

fi(x) = x and / 2 (x) = cosx 

are linearly independent in F(— oo, oo) because neither function 
is a scalar multiple of the other. Confirm the linear indepen- 
dence using the Wronskian. 

18. ( Calculus required) The functions 

/i(x) = sinx and f 2 (x) = cosx 

are linearly independent in F(— oo, oo) because neither function 
is a scalar multiple of the other. Confirm the linear indepen- 
dence using the Wronskian. 

19. ( Calculus required) Use the Wronskian to show that the fol- 
lowing sets of vectors are linearly independent. 

(a) 1, x, e x (b) 1. x, x 2 

20. ( Calculus required) Use the Wronskian to show that the func- 
tions /i(x) = e x , f 2 (x) — xe x , and / 3 (x) = x 2 e x are linearly 
independent vectors in C"(— oo, oo). 

21. ( Calculus required) Use the Wronskian to show that the func- 
tions /i(x) = sinx, / 2 (x) = cosx, and / 3 (x) = x cosx are 
linearly independent vectors in C°°(— oo, oo). 
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22. Show that for any vectors u, v, and w in a vector space V, the 
vectors u — v, v — w, and w — u form a linearly dependent set. 

23. (a) In Example 1 we showed that the mutually orthogonal vec- 

tors i, j, and k form a linearly independent set of vectors in 
R 3 . Do you think that every set of three nonzero mutually 
orthogonal vectors in R 3 is linearly independent? Justify 
your conclusion with a geometric argument. 

(b) Justify your conclusion with an algebraic argument. [Hint: 
Use dot products.] 

Working with Proofs 

24. Prove that if {vi, v 2 , v 3 ) is a linearly independent set of vectors, 
then so are {vi, v 2 ), {vi, v 3 }, {v 2 , v 3 ), (vi), (v 2 ], and (v 3 ). 

25. Prove that if S = (vi, v 2 , . . . , v r j is a linearly independent set 
of vectors, then so is every nonempty subset of S. 

26. Prove that if S = (vi , v 2 , v 3 ) is a linearly dependent set of vec- 
tors in a vector space V, and V4 is any vector in V that is not 
in 5, then {vi, v 2 , v 3 , V4} is also linearly dependent. 

27. Prove that if S = {vi, v 2 , . . . , v r ) is a linearly dependent set of 
vectors in a vector space V, and if v r+ i , are any vectors 
in V that are not in 5, then {vi, v 2 , . . . , v r , v r+ i, . . . , v„] is also 
linearly dependent. 

28. Prove that in P 2 every set with more than three vectors is lin- 
early dependent. 

29. Prove that if (vj , v 2 } is linearly independent and v 3 does not lie 
in spanfvi, v 2 ), then {vi, v 2 , v 3 ] is linearly independent. 

30. Use part (a) of Theorem 4.3.1 to prove part ( b ). 

31. Prove part ( b ) of Theorem 4.3.2. 

32. Prove part (c) of Theorem 4.3.2. 

True-False Exercises 

TF. In parts (a)-(h) determine whether the statement is true or 

false, and justify your answer. 


(a) A set containing a single vector is linearly independent. 

(b) The set of vectors (v, kx) is linearly dependent for every 
scalar k. 

(c) Every linearly dependent set contains the zero vector. 

(d) If the set of vectors {vj, v 2 , v 3 ) is linearly independent, then 
{&V!, kx 2 , &v 3 ) is also linearly independent for every nonzero 
scalar k. 

(e) If Vi , . . . , v„ are linearly dependent nonzero vectors, then 

at least one vector v*, is a unique linear combination of 
vi v*_i. 

(f ) The set of 2 x 2 matrices that contain exactly two l's and two 
0’s is a linearly independent set in M 22 . 

(g) The three polynomials (x — 1)(jt + 2), x{x + 2), and 
x(x — 1) are linearly independent. 

(h) The functions f\ and / 2 are linearly dependent if there is a real 
number x such that fcj /j (x) + k 2 / 2 (x) = 0 for some scalars k\ 
and k 2 . 

Working withTechnology 

Tl. Devise three different methods for using your technology util- 
ity to determine whether a set of vectors in R" is linearly indepen- 
dent, and then use each of those methods to determine whether 
the following vectors are linearly independent. 

Vl = (4, -5, 2, 6), v 2 = (2,-2, 1,3), 

v 3 = (6 , -3, 3, 9), v 4 = (4, -1,5, 6) 

T2. Show that S = {cost, sint, cos 2 1, sin 2z } is a linearly inde- 
pendent set in C (— 00, 00) by evaluating the left side of the equation 

Ci cos t + c 2 sin t + c 3 cos 2 1 + C4 sin 2t = 0 

at sufficiently many values of t to obtain a linear system whose 
only solution is c 1 = c 2 = c 3 = C4 = 0. 


4.4 Coordinates and Basis 

We usually think of a line as being one-dimensional, a plane as two-dimensional, and the 
space around us as three-dimensional. It is the primary goal of this section and the next to 
make this intuitive notion of dimension precise. In this section we will discuss coordinate 
systems in general vector spaces and lay the groundwork for a precise definition of 
dimension in the next section. 


Coordinate Systems in In analytic geometry one uses rectangular coordinate systems to create a one-to-one cor- 
Linear Algebra respondence between points in 2-space and ordered pairs of real numbers and between 
points in 3-space and ordered triples of real numbers (Figure 4.4.1). Although rectan- 
gular coordinate systems are common, they are not essential. For example. Figure 4.4.2 
shows coordinate systems in 2-space and 3-space in which the coordinate axes are not 
mutually perpendicular. 
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y 


b 


O 


P(a , b ) 



► Figure 4.4.1 


Coordinates of P in a rectangular 
coordinate system in 2-space. 



► Figure 4.4.2 



Coordinates of P in a nonrectangular 
coordinate system in 2-space. 



In linear algebra coordinate systems are commonly specified using vectors rather 
than coordinate axes. For example, in Figure 4.4.3 we have re-created the coordinate 
systems in Figure 4.4.2 by using unit vectors to identify the positive directions and then 
attaching coordinates to a point P using the scalar coefficients in the equations 

OP — aui + fru 2 and OP = czui + ba 2 + cu 3 




Units of measurement are essential ingredients of any coordinate system. In ge- 
ometry problems one tries to use the same unit of measurement on all axes to avoid 
distorting the shapes of figures. This is less important in applications where coordinates 
represent physical quantities with diverse units (for example, time in seconds on one axis 
and temperature in degrees Celsius on another axis). To allow for this level of generality, 
we will relax the requirement that unit vectors be used to identify the positive directions 
and require only that those vectors be linearly independent. We will refer to these as the 
“basis vectors” for the coordinate system. In summary, it is the directions of the basis 
vectors that establish the positive directions, and it is the lengths of the basis vectors that 
establish the spacing between the integer points on the axes (Figure 4.4.4). 
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Equal spacing 


Unequal spacing 


Equal spacing 


Unequal spacing 

Perpendicular axes 


Perpendicular axes 


Skew axes 


Skew axes 


▲ Figure 4.4.4 

Basis for a Vector Space Our next goal is to extend the concepts of “basis vectors” and “coordinate systems” to 

general vector spaces, and for that purpose we will need some definitions. Vector spaces 
fall into two categories: A vector space V is said to be finite-dimensional if there is a 
finite set of vectors in V that spans V and is said to be infinite-dimensional if no such set 
exists. 


DEFINITION 1 If 5 = {v! , V 2 , . . . , v„} is a set of vectors in a finite-dimensional vector 
space V, then S is called a basis for V if: 

(a) S spans V. 

(b) 5 is linearly independent. 


If you think of a basis as describing a coordinate system for a finite-dimensional 
vector space V, then part (a) of this definition guarantees that there are enough basis 
vectors to provide coordinates for all vectors in V, and part (b) guarantees that there is 
no interrelationship between the basis vectors. Here are some examples. 

► EXAMPLE 1 The Standard Basis for R n 

Recall from Example 1 1 of Section 4.2 that the standard unit vectors 

d = (1, 0, 0, ... , 0), e 2 = (0, 1,0, .... e„ = (0,0,0, 1) 

span R n and from Example 1 of Section 4.3 that they are linearly independent. Thus, 
they form a basis for R" that we call the standard basis for R" . In particular, 

i= (1,0,0), j= (0,1,0), k= (0,0,1) 

is the standard basis for if. 

l : - EXAMPLE 2 The Standard Basis for P n 

Show that S = {1, x,x l 2 , . . . , x n } is a basis for the vector space P„ of polynomials of 
degree n or less. 

Solution We must show that the polynomials in S are linearly independent and span 
P„ . Let us denote these polynomials by 

Po =1 > Pi=*> p 2 =x 2 ,..., P„ = x n 

We showed in Example 13 of Section 4.2 that these vectors span P„ and in Example 4 
of Section 4.3 that they are linearly independent. Thus, they form a basis for P„ that we 
call the standard basis for P n . 
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From Examples 1 and 3 you 
can see that a vector space can 
have more than one basis. 


EXAMPLE 3 Another Basis for R 3 

Show that the vectors vi = (1, 2, 1), V2 = (2, 9, 0), and v 3 = (3, 3, 4) form a basis fori? 3 . 

Solution We must show that these vectors are linearly independent and span R 3 . To 
prove linear independence we must show that the vector equation 

civi + c 2 v 2 + c 3 v 3 = 0 (1) 

has only the trivial solution; and to prove that the vectors span R 3 we must show that 
every vector b = (b \ , b 2 , b 3 ) in R 3 can be expressed as 

C1V1 + c 2 v 2 + C3V3 = b ( 2 ) 

By equating corresponding components on the two sides, these two equations can be 
expressed as the linear systems 

ci + 2c 2 + 3c 3 = 0 ci + 2c 2 + 3c 3 = b\ 

2ci + 9c 2 + 3c3 = 0 and 2ci + 9c 2 + 3c 3 = b 2 (3) 

Cl + 4c 3 =0 Cl + 4c 3 = £>3 


(verify). Thus, we have reduced the problem to showing that in (3) the homogeneous 
system has only the trivial solution and that the nonhomogeneous system is consistent 
for all values of b\ , b 2 , and b 2 . But the two systems have the same coefficient matrix 


A = 


1 2 
2 9 
1 0 


3 

3 

4 


so it follows from parts ( b ), (e), and (g) of Theorem 2.3.8 that we can prove both results 
at the same time by showing that det(A) / 0. We leave it for you to confirm that 
det(A) = — 1, which proves that the vectors Vi, v 2 , and v 3 form a basis for R 3 . 


► EXAM PLE 4 The Standard Basis for M mn 

Show that the matrices 



n 01 


ro 11 


0 

0 


0 

0 

II 

§ 

1 

0 

0 

11 

5 

1 

0 

0 

1 

11 

€ 

1 

0 

1 

11 

£ 

1 

0 
1 


form a basis for the vector space M 22 of 2 x 2 matrices. 


Solution We must show that the matrices are linearly independent and span M 22 . To 
prove linear independence we must show that the equation 

ci Mi + c 2 M 2 + C3M3 + C4M4 = 0 ( 4 ) 


has only the trivial solution, where 0 is the 2x2 zero matrix; and to prove that the 
matrices span M 22 we must show that every 2x2 matrix 

b 


B = 


can be expressed as 

ci Mi + c 2 M 2 + C3M3 ■ 

The matrix forms of Equations (4) and (5) are 


and 


C4M4 = B 



n 01 


ro 

11 


ro 01 


ro 

01 


ro 

01 

Cl 

0 0 

+ c 2 

0 

0 

+ c 3 

1 0 

+ C4 

0 

1 

— 

0 

0 



ri 01 


ro 

11 


ro 01 


ro 01 


a 

b 

Cl 

0 0 

+ c 2 

0 

0 

+ c 3 

1 0 

+ C4 

0 1 

— 

c 

d 


( 5 ) 
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which can be rewritten as 


'Cl 

c 2 ~ 


'0 

O' 

and 

'Cl 

c 2 


~a 

b~ 

— 

_0 

0_ 

= 



. c 3 

C\_ 



. c 3 

C4. 


_c 

d _ 


Since the first equation has only the trivial solution 

Ci = Co = C3 = C4 = 0 

the matrices are linearly independent, and since the second equation has the solution 
Ci = a, c 2 = b, C3 = c, C4 = d 

the matrices span M 22 . This proves that the matrices M\, M 2 , M 3 , M 4 form a basis for 
M 22 . More generally, the mn different matrices whose entries are zero except for a single 
entry of 1 form a basis for M mn called the standard basis for M mn . 


The simplest of all vector spaces is the zero vector space V — {0}. This space is 
finite-dimensional because it is spanned by the vector 0. However, it has no basis in the 
sense of Definition 1 because {0} is not a linearly independent set (why?). However, we 
will find it useful to define the empty set 0 to be a basis for this vector space. 

i EXAMPLE 5 An Infinite-Dimensional Vector Space 

Show that the vector space of P x of all polynomials with real coefficients is infinite- 
dimensional by showing that it has no finite spanning set. 

Solution If there were a finite spanning set, say S = {pj, p 2 , . . . , p r }, then the degrees 
of the polynomials in S would have a maximum value, say n; and this in turn would 
imply that any linear combination of the polynomials in S would have degree at most n . 
Thus, there would be no way to express the polynomial x" +1 as a linear combination of 
the polynomials in S, contradicting the fact that the vectors in S span P x . 


► EXAMPLE 6 Some Finite- and Infinite-Dimensional Spaces 

In Examples 1, 2, and 4 we found bases for R" , P n , and M mn , so these vector spaces 
are finite-dimensional. We showed in Example 5 that the vector space P M is not spanned 
by finitely many vectors and hence is infinite-dimensional. Some other examples of 
infinite-dimensional vector spaces are f?°°, F(— 00, 00), C(— 00, 00), C m {— 00, 00), and 
C“(-oo,oo). ◄ 

Coordinates Relative to a Earlier in this section we drew an informal analogy between basis vectors and coordinate 
Basis systems. Our next goal is to make this informal idea precise by defining the notion of a 
coordinate system in a general vector space. The following theorem will be our first step 
in that direction. 

Uniqueness of Basis Representation 

If S = {vi, v 2 , . . . , v„} is a basis for a vector space V, then every vector v in V can be 
expressed in the form v = ciVi + c 2 v 2 + ■ • • + c„v„ in exactly one way. 


Proof Since S spans V, it follows from the definition of a spanning set that every vector 
in V is expressible as a linear combination of the vectors in S. To see that there is only 
one way to express a vector as a linear combination of the vectors in S, suppose that 
some vector v can be written as 


v = cqvi + c 2 \ 2 H h c n \, 
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▲ Figure 4.4.5 


Sometimes it will be desirable 
to write a coordinate vector as 
a column matrix or row ma- 
trix, in which case we will de- 
note it with square brackets as 
[v] s . We will refer to this as the 
matrix form of the coordinate 
vector and (6) as the comma- 
delimited form. 


and also as 

v = At i v i + k 2 \ 2 H h k,,y n 

Subtracting the second equation from the first gives 

0 = (ci — &i)vi + (c 2 - k 2 )\ 2 H b (c„ - k n )y n 

Since the right side of this equation is a linear combination of vectors in S, the linear 
independence of S implies that 

ci — k\ = 0, c 2 — k 2 = 0, . . . , c n — k, , = 0 

that is, 

ci — k\, c 2 — k 2 , . . . , Cn — k n 
Thus, the two expressions for v are the same. 

We now have all of the ingredients required to define the notion of “coordinates” in a 
general vector space V. For motivation, observe that in R 3 , for example, the coordinates 
(a, b , c) of a vector v are precisely the coefficients in the formula 

v = ai + b\ + ck 

that expresses v as a linear combination of the standard basis vectors for R 3 (see Fig- 
ure 4.4.5). The following definition generalizes this idea. 


DEFINITION 2 If S = {vi, v 2 , ■ ■ ■ , v„} is a basis for a vector space V, and 

v = ciVi + c 2 \ 2 H b c„\„ 

is the expression for a vector v in terms of the basis S, then the scalars ci , c 2 , . . . , c„ 
are called the coordinates of v relative to the basis S. The vector (ci, c 2 , . . . , c„) in 
R" constructed from these coordinates is called the coordinate vector of v relative to 
S'; it is denoted by 

Ms = (ci,c 2 , ... ,c„) (6) 


Remark It is standard to regard two sets to be the same if they have the same members, even if 
those members are written in a different order. In particular, in a basis for a vector space V, which 
is a set of linearly independent vectors that span V, the order in which those vectors are listed 
does not generally matter. However, the order in which they are listed is critical for coordinate 
vectors, since changing the order of the basis vectors changes the coordinate vectors [for example, 
in R 2 the coordinate pair (1, 2) is not the same as the coordinate pair (2, 1)]. To deal with this 
complication, many authors define an ordered basis to be one in which the listing order of the 
basis vectors remains fixed. In all discussions involving coordinate vectors we will assume that the 
underlying basis is ordered, even though we may not say so explicitly. 


Observe that (v).y is a vector in R'\ so that once an ordered basis S is given for a 
vector space V, Theorem 4.4.1 establishes a one-to-one correspondence between vectors 
in V and vectors in R" (Figure 4.4.6). 


A one-to-one correspondence 
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► EXAMPLE 7 Coordinates Relative to the Standard Basis for R n 

In the special case where V = R n and S is the standard basis, the coordinate vector (v)s 
and the vector v are the same; that is, 

v = (v)s 

For example, in R 3 the representation of a vector v = (a, b, c) as a linear combination 
of the vectors in the standard basis S = {i, j, k} is 

v = ai + b\ + ck 

so the coordinate vector relative to this basis is (v) s = (a, b,c), which is the same as the 
vector v. 


► EXAMPLE 8 Coordinate Vectors Relative to Standard Bases 

(a) Find the coordinate vector for the polynomial 

p(.v) = Co + C\X + c 2 x 2 + ■ ■ ■ + c n x n 
relative to the standard basis for the vector space P„ . 

(b) Find the coordinate vector of 

a b 
B 

j d 

relative to the standard basis for M 2 2 . 

Solution (a) The given formula for p(T) expresses this polynomial as a linear combina- 
tion of the standard basis vectors S — {l, x, x 2 , . . . , x n }. Thus, the coordinate vector 
for p relative to S is 

(p)s = (Co, Cl, c 2 , . . . , c„) 

Solution (b) We showed in Example 4 that the representation of a vector 

a b 

B = 

c d 

as a linear combination of the standard basis vectors is 



a b 


ri 

01 


ro 

11 


ro 

01 


0 

0 

B = 

c d 

= a 

0 

0 

+ b 

0 

0 

+ C 

1 

0 

+ d 

0 1 


so the coordinate vector of B relative to S is 

( B)s = (a, b, c, d) 


► EXAMPLE 9 Coordinates in R 3 

(a) We showed in Example 3 that the vectors 

Vi = (1,2,1), v 2 = (2,9,0), V 3 = (3, 3, 4) 

form a basis for R 3 . Find the coordinate vector of v = (5, —1, 9) relative to the 
basis S — {vi, v 2 , V 3 }. 

(b) Find the vector v in R 3 whose coordinate vector relative to S is (v)s = (—1,3, 2). 

Solution (a! To find (v)$ we must first express v as a linear combination of the vectors 
in S' that is, we must find values of ci, c 2 , and C3 such that 


v = dvi + c 2 v 2 + C3V3 
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or, in terms of components, 

(5, -1, 9) = ci(l, 2, 1) + c 2 ( 2, 9, 0) + c 3 (3, 3, 4) 
Equating corresponding components gives 

Ci + 2 c 2 4~ 3c 3 = 5 

2c i + 9 c 2 4“ 3c 3 = — 1 
ci 4- 4 c 3 = 9 

Solving this system we obtain Ci = 1, c 2 = — 1, c 3 = 2 (verify). Therefore, 

(v)s = (1, -1,2) 

Solution (£>) Using the definition of (v)$, we obtain 
v = (— l)vi 4- 3v 2 4- 2 v 3 

= ( — 1) (1 , 2, 1)4- 3(2, 9, 0)4- 2(3, 3, 4) = (11,31,7) ◄ 


Exercise Set 4.4 

1. Use the method of Example 3 to show that the following set 
of vectors forms a basis for R 2 . 

{(2, 1), (3, 0)} 

2. Use the method of Example 3 to show that the following set 
of vectors forms a basis for R 3 . 

{(3, 1,-4), (2, 5, 6), (1,4, 8)} 

3. Show that the following polynomials form a basis for P 2 . 

x 1 4- 1, x 2 — 1, 2x — 1 

4. Show that the following polynomials form a basis for P } . 

1 4- Jc, 1— x, 1 — x 2 , 1 — x 2 

5. Show that the following matrices form a basis for M 22 . 


"3 6 


1 

0 

1 

1 


1 

OO 

1 

O 

1 


' O 

1 

3 -6 


1 

1 

O 

1 


1 

1 

K> 

1 

1 


-1 2 


6. Show that the following matrices form a basis for M 22 . 


"1 f 


'l -1" 


"0 -1" 


'l 0" 

1 1 

’ 

1 

O 

O 

1 

’ 

1 0 

’ 

0 0 


7. In each part, show that the set of vectors is not a basis for R ' . 

(a) {(2,-3, 1), (4, 1, 1), (0, —7, 1)} 

(b) {(1,6, 4), (2, 4,-1), (-1,2, 5)} 

8. Show that the following vectors do not form a basis for P 2 . 

1 — 3x 4- lx 2 , 1 4- x + 4.r 2 , 1 — lx 


9. Show that the following matrices do not form a basis for M 22 . 


’1 o’ 


’2 -2’ 


’1 -f 


’0 -f 

1 1 

’ 

3 2 

’ 

1 0 

’ 

1 1 


10. Let V be the space spanned by Vi = cos 2 x,y 2 = sin 2 x, 
v 3 = cos2x. 

(a) Show that S = {vi , v 2 , v 3 } is not a basis for V. 

(b) Find a basis for V. 

11. Find the coordinate vector of w relative to the basis 
S = (ii!, u 2 ) for R 2 . 

(a) u 1 = (2,-4), U2 = (3,8); w = (1, 1) 

(b) ui = (1, 1), u 2 = (0, 2); w = (a, b ) 

12. Find the coordinate vector of w relative to the basis 
5 = {ui, “2) for R 2 . 

(a) ip = (1, -1), U2 = (1,1); w = (l,0) 

(b) ui = (1, — 1), u 2 = (1,1); w = (0, 1) 

13. Find the coordinate vector of v relative to the basis 
S = {vi, v 2 , V3) for R 3 . 

(a) v = (2, -1, 3); vi = (1,0, 0), v 2 = (2, 2, 0), 
v 3 = (3, 3, 3) 

(b) v = (5, -12, 3); vj = (1, 2, 3), v 2 = (-4, 5, 6), 
v 3 = (7, -8, 9) 

14. Find the coordinate vector of p relative to the basis 
S = (Pi, p 2 , p 3 } for P 2 . 

(a) p = 4 — 3x + x 2 ; p, = 1, p 2 = x, p 3 = x 2 

(b) p = 2 - x + x 2 : pi = 1 4- x, p 2 = 1 4- x 2 , p 3 = x 4- Jt 2 
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InExercises -If , first show that the set 5 = [Ai, A 2 , A 3 , A 4 j 
is a basis for M 22 , then express A as a linear combination of the 
vectors in S, and then find the coordinate vector of A relative 
to S. 


and u 2 . Find the x'y'-coordinates of the points whose xy- 
coordinates are given. 

(a) (V3, 1) (b) (1,0) (c) (0,1) (d) (a,b) 


15. A, = 
A 4 = 


"1 

_1 

"0 

0 


f 

1 

O' 

1 


a 2 


"o r 

i i 



o' 

o 


a 3 


"0 0 " 
1 1 


16. A, = 



O' 

0 

O' 

0 


"i r 

.o 0 

'6 

5 


2 ' 

3 


'1 
0 1 


In Exercises 7 -18, first show that the set S = {p 3 , p 2 , p 3 ) is a 
basis for P 2 , then express p as a linear combination of the vectors 
in S, and then find the coordinate vector of p relative to S. 

17. pj = 1 + x + x 2 , p, = x + x 2 , p 3 = x 2 ; 
p = 7 — x + 2x 2 



◄ Figure Ex-23 


24. The accompanying figure shows a rectangular xy-coordinate 
system and an x'y'-coordinate system with skewed axes. As- 
suming that 1-unit scales are used on all the axes, find the x'y'- 
coordinates of the points whose xy-coordinates are given. 

(a) (1,1) (b) (1,0) (c) (0,1) (d) (a,b) 


18. p t = 1 + 2x + x 2 , p, = 2 + 9x, p 3 = 3 + 3x + 4x 2 ; 
p = 2 + 17x — 3x 2 


19. In words, explain why the sets of vectors in parts (a) to (d) are 
not bases for the indicated vector spaces. 

(a) m = (1, 2), u 2 = (0, 3), u 3 = (1, 5) for R 2 

(b) Ul = (-1, 3, 2), u 2 = (6, 1, 1) for R } 

(c) P( = 1 + x + x 2 , p 2 = x for P 2 


(d) A = 


D = 


1 

2 

’5 

4 


0 

3 

0’ 

2 



6 0" 


’3 0" 

B = 

-1 4 

, C = 

1 7 


for M 22 


20. In any vector space a set that contains the zero vector must be 
linearly dependent. Explain why this is so. 


21. In each part, let T A : R' —*■ R 3 be multiplication by A, and let 
(ei, e 2 , e 3 ) be the standard basis for R 3 . Determine whether 
the set {TaIci), T A (e 2 ), T A (e 3 )} is linearly independent in R 2 . 


1 

1 

l" 


1 

1 

2" 

0 

1 

-3 

(b) A = 

0 

1 

1 

-1 

2 

0 


-1 

2 

1 


22. In each part, let T A : R 3 — > R 3 be multiplication by A, and let 
u = (1, —2, —1). Find the coordinate vector of r A (u) relative 
to the basis S = {(1, 1,0), (0, 1, 1), (1. 1, 1)) for R 3 . 



“2 

-1 

O' 


"0 

1 

o’ 

(a) A = 

1 

1 

1 

(b) A = 

1 

0 

1 


0 

-1 

2 


0 

0 

1 


23. The accompanying figure shows a rectangular xy-coordin- 
ate system determined by the unit basis vectors i and j and 
an x'y'-coordinate system determined by unit basis vectors Ui 



◄ Figure Ex-24 


25. The first four Hermite polynomials [named for the French 
mathematician Charles Hermite (1822-1901)] are 

1, 2f, —2 + 4/ 2 , — 12/ + 8/ 3 

These polynomials have a wide variety of applications in 
physics and engineering. 

(a) Show that the first four Hermite polynomials form a basis 
for P 2 . 

(b) Let B be the basis in part (a). Find the coordinate vector 
of the polynomial 

p(/) = -1 - 4/ + 8/ 2 + 8 1 3 

relative to B. 

26. The first four Laguerre polynomials [named for the French 
mathematician Edmond Laguerre (1834-1886)] are 

1, 1 - t, 2-4 t + t 2 , 6 — I8t + 9t 2 — t 3 

(a) Show that the first four Laguerre polynomials form a basis 
for P 2 . 

(b) Let B be the basis in part (a). Find the coordinate vector 
of the polynomial 

p(/) = —10/ + 9r — t 3 

relative to B. 
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27. Consider the coordinate vectors 






'-8 

6 ' 


'3' 


7 

-1 

- [qls = 

0 

, [B] s = 

5 

4 


4 


3_ 


(a) Find w if S is the basis in Exercise 2. 

(b) Find q if 5 is the basis in Exercise 3. 

(c) Find B if S is the basis in Exercise 5. 

28. The basis that we gave for Mi 2 in Example 4 consisted of non- 
invertible matrices. Do you think that there is a basis for M 2 2 
consisting of invertible matrices? Justify your answer. 

Working with Proofs 

29. Prove that /?" is an infinite-dimensional vector space. 

30. Let T a : R" — *■ R" be multiplication by an invertible matrix 
A, and let {ui, u 2 , . . . , u„j be a basis for R" . Prove that 
{T a (ui), T a ( ui), . . . , 7a(u„)! is also a basis for R n . 

31. Prove that if V is a subspace of a vector space W and if V is 
infinite-dimensional, then so is W. 

True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

(a) If V = spanjv! v„j, then {v! v„) is a basis for V. 

(b) Every linearly independent subset of a vector space V is a 
basis for V. 


(c) If (vi, v 2 , . . . , v„j is a basis for a vector space V, then ev- 
ery vector in V can be expressed as a linear combination of 
Vl, v 2 , . .. , v„. 

(d) The coordinate vector of a vector x in R n relative to the stan- 
dard basis for R" is x. 

(e) Every basis of P A contains at least one polynomial of degree 3 
or less. 

Working with Technology 

Tl. Let V be the subspace of P 2 spanned by the vectors 

= 1 + 5x — 3x 2 — 1 lx 3 , pi = 7 + 4x — x 2 + 2x 3 , 
p 3 = 5 + x + 9x 2 + 2x 3 , p 4 = 3 — x + lx 2 + 5x 3 

(a) Find a basis S for V. 

(b) Find the coordinate vector of p = 19 + 18x — 13x 2 — 10x 3 
relative to the basis S you obtained in part (a). 

T2. Let V be the subspace of C“(— 00 , 00 ) spanned by the vectors 
in the set 

B = {1, cosx, cos 2 x, cos 3 x, cos 4 x, cos 5 x) 

and accept without proof that B is a basis for V. Confirm that 
the following vectors are in V, and find their coordinate vectors 
relative to B. 

fo = 1 , f 1 = cos x , f 2 = cos 2x , f 3 = cos 3x , 

U = cos4x, fi — cos 5x 


4.5 Dimension 

We showed in the previous section that the standard basis for R n has n vectors and hence 
that the standard basis for R 3 has three vectors, the standard basis for R 2 has two vectors, and 
the standard basis for R l (= R) has one vector. Since we think of space as three-dimensional, 
a plane as two-dimensional, and a line as one-dimensional, there seems to be a link between 
the number of vectors in a basis and the dimension of a vector space. We will develop this 
idea in this section. 


Number of Vectors in a Our first goal in this section is to establish the following fundamental theorem. 

Basis 

THEOREM 4.5.1 All bases for a finite-dimensional vector space have the same number 
of vectors. 


To prove this theorem we will need the following preliminary result, whose proof is 
deferred to the end of the section. 
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THEOREM 4.5.2 Let V be an n-dimensional vector space , and let {vi, V2, ■ ■ ■ , v„} be 
any basis. 

(a) If a set in V has more than n vectors, then it is linearly dependent. 

(b) If a set in V has fewer than n vectors, then it does not span V. 


We can now see rather easily why Theorem 4.5.1 is true; for if 

S = {vi, \ 2 , ..., v„} 

is an arbitrary basis for V, then the linear independence of S implies that any set in V 
with more than n vectors is linearly dependent and any set in V with fewer than n vectors 
does not span V. Thus, unless a set in V has exactly n vectors it cannot be a basis. 

We noted in the introduction to this section that for certain familiar vector spaces 
the intuitive notion of dimension coincides with the number of vectors in a basis. The 
following definition makes this idea precise. 


DEFINITION 1 The dimension of a finite-dimensional vector space V is denoted by 
dim( V) and is defined to be the number of vectors in a basis for V. In addition, the 
zero vector space is defined to have dimension zero. 

Engineers often use the term 
degrees of freedom as a syn- 
onym for dimension. ► EXAMPLE 1 Dimensions of Some Familiar Vector Spaces 

dirn(/?") = n |The standard basis has n vectors. | 

dirn(P„) — // — 1 | The standard basis has // + 1 vectors.] 

dim(M m „) — inn |The standard basis has mil vectors.] 

► EXAMPLE 2 Dimension of Span(S) 

If S' = {vi, v 2 , . . . , v r } then every vector in span(S) is expressible as a linear combination 
of the vectors in S. Thus, if the vectors in S are linearly independent, they automatically 
form a basis for span(S), from which we can conclude that 

dim[span{vi, V2. ■ ■ ■ , v r }] = r 

In words, the dimension of the space spanned by a linearly independent set of vectors is 
equal to the number of vectors in that set. 


► EXAMPLE 3 Dimension of a Solution Space 

Find a basis for and the dimension of the solution space of the homogeneous system 
x \ + 3 x 2 — 2x3 + 2x5 = 0 

2xi + 6x 2 — 5x3 — 2 x 4 + 4x5 — 3x6 = 0 

5x3 + IOX4 + 15 x 6 = 0 

2xi + 6x2 + 8x4 + 4x5 + 18 x 6 = 0 

Solution In Example 6 of Section 1 .2 we found the solution of this system to be 
xi — —3 r — As — 2 1, X2 = r, X3 = —2s, x 4 — s, X5 = t, X6 = 0 
which can be written in vector form as 


(xi, x 2 , X3, X4, X5, X6) = (—3 r — As — 2 1, r, —2s, s, t, 0) 
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or, alternatively, as 

(jti, X 2 , X 3 , X 4 , X 5 , ^ 6 ) = r(— 3, 1, 0, 0, 0, 0) + s(— 4, 0, —2, 1, 0, 0) + t{— 2, 0, 0, 0, 1, 0) 
This shows that the vectors 

vi = (-3, 1,0, 0,0,0), v 2 = (-4,0, -2, 1,0,0), v 3 = (-2, 0, 0, 0, 1, 0) 

span the solution space. We leave it for you to check that these vectors are linearly 
independent by showing that none of them is a linear combination of the other two (but 
see the remark that follows). Thus, the solution space has dimension 3. 


Remark It can be shown that for any homogeneous linear system, the method of the last example 
always produces a basis for the solution space of the system. We omit the formal proof. 


Some Fundamental We will devote the remainder of this section to a series of theorems that reveal the subtle 
Theorems interrelationships among the concepts of linear independence, spanning sets, basis, and 
dimension. These theorems are not simply exercises in mathematical theory — they are 
essential to the understanding of vector spaces and the applications that build on them. 

We will start with a theorem (proved at the end of this section) that is concerned with 
the effect on linear independence and spanning if a vector is added to or removed from 
a nonempty set of vectors. Informally stated, if you start with a linearly independent set 
S and adjoin to it a vector that is not a linear combination of those already in S, then 
the enlarged set will still be linearly independent. Also, if you start with a set S of two 
or more vectors in which one of the vectors is a linear combination of the others, then 
that vector can be removed from S without affecting spanfS) (Figure 4.5.1 ). 



The vector outside the plane 
can be adjoined to the other 
two without affecting their 
linear independence. 


Any of the vectors can 
be removed, and the 
remaining two will still 
span the plane. 


Either of the col 1 inear 
vectors can be removed, 
and the remaining two 
will still span the plane. 


A Figure 4.5.1 


Plus/Minus Theorem 

Let S be a nonempty set of vectors in a vector space V. 

(a) If S is a linearly independent set , and if v is a vector in V that is outside of 
span(S ), then the set S U {v} that results by inserting v into S is still linearly 
independent. 

(b) If v is a vector in S that is expressible as a linear combination of other vectors 
in S, and if S — {v} denotes the set obtained by removing v from S , then S and 
S — {v} span the same space ; that is, 

span(S') = span(S — {v}) 
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► EXAMPLE 4 Applying the Plus/MinusTheorem 

Show that P! = 1 — x 2 , p 2 = 2 — x 2 , and p 3 = x 3 are linearly independent vectors. 

Solution The set 5 1 = {p x , p 2 } is linearly independent since neither vector in S’ is a scalar 
multiple of the other. Since the vector p 3 cannot be expressed as a linear combination 
of the vectors in S (why?), it can be adjoined to S to produce a linearly independent set 
5U{p 3 ) = {Pi,p 2 ,P 3 }- 


In general, to show that a set of vectors {vi , v 2 , . . . , v„ } is a basis for a vector space V, 
one must show that the vectors are linearly independent and span V. However, if we 
happen to know that V has dimension n (so that {v 3 , v 2 , . . . , v„} contains the right 
number of vectors for a basis), then it suffices to check either linear independence or 
spanning — the remaining condition will hold automatically. This is the content of the 
following theorem. 


Let V be an n-dimensional vector space, and let S be a set in V 
with exactly n vectors. Then S is a basis for V if and only if S spans V or S is linearly 
independent. 


Proof Assume that S has exactly n vectors and spans V. To prove that S is a basis, we 
must show that 5 is a linearly independent set. But if this is not so, then some vector v in 
S is a linear combination of the remaining vectors. If we remove this vector from S, then 
it follows from Theorem 4. 5. 3(6) that the remaining set of n — 1 vectors still spans V. 
But this is impossible since Theorem 4.5.2 (b) states that no set with fewer than n vectors 
can span an n-dimensional vector space. Thus S is linearly independent. 

Assume that S has exactly n vectors and is a linearly independent set. To prove 
that S is a basis, we must show that S spans V. But if this is not so, then there is 
some vector v in V that is not in span(5'). If we insert this vector into S , then it fol- 
lows from Theorem 4.5.3(a) that this set of n + 1 vectors is still linearly independent. 
But this is impossible, since Theorem 4.5.2(a) states that no set with more than n vec- 
tors in an n -dimensional vector space can be linearly independent. Thus S spans V. 

◄ 


I EXAMPLE 5 Bases by Inspection 

(a) Explain why the vectors vi = (—3, 7) and v 2 = (5, 5) form a basis for R 2 . 

(b) Explain why the vectors vi = (2, 0, —1), v 2 = (4, 0, 7), and v 3 = (—1, 1, 4) form a 
basis for R 3 . 

Solution (a) Since neither vector is a scalar multiple of the other, the two vectors form 
a linearly independent set in the two-dimensional space R 1 , and hence they form a basis 
by Theorem 4.5.4. 

Solution (h) The vectors vi and v 2 form a linearly independent set in the xz-plane (why?). 
The vector v 3 is outside of the xz-plane, so the set {vi , v 2 , v 3 ) is also linearly independent. 
Since R 3 is three-dimensional, Theorem 4.5.4 implies that {vi, v 2 , v 3 ) is a basis for the 
vector space R 3 . 


The next theorem (whose proof is deferred to the end of this section) reveals two 
important facts about the vectors in a finite-dimensional vector space V : 
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1. Every spanning set for a subspace is either a basis for that subspace or has a basis 
as a subset. 

2. Every linearly independent set in a subspace is either a basis for that subspace or 
can be extended to a basis for it. 


REM 4.5.5 Let S be a finite set of vectors in a finite-dimensional vector space V. 

(a) If S spans V but is not a basis for V, then S can be reduced to a basis for V by 
removing appropriate vectors from S. 

(b) If S is a linearly independent set that is not already a basis for V, then S can be 
enlarged to a basis for V by inserting appropriate vectors into S. 


We conclude this section with a theorem that relates the dimension of a vector space 
to the dimensions of its subspaces. 


THEOREM 4.5.6 IfW is a subspace of a finite-dimensional vector space V, then : 

(a) W is finite-dimensional. 

(. b ) dim(W) < dim(V). 

(c) W = V if and only if dim(W) = dim(E). 

Proof (a) We will leave the proof of this part as an exercise. 

Proof [b) Part (a) shows that W is finite-dimensional, so it has a basis 

S = {wi,w 2 , ...,w m } 

Either S is also a basis for V or it is not. If so, then dim(E) = m, which means that 
dim(V) = dim(W). If not, then because S is a linearly independent set it can be enlarged 
to a basis for V by part (b) of Theorem 4.5.5. But this implies that dim(W) < dim(E), 
so we have shown that dim(W) < dim(V) in all cases. 

Proof (c) Assume that d i m ( W ) = d i m ( V ) and that 

S = {W!,W2, 

is a basis for W. If S is not also a basis for V, then being linearly independent S can 
be extended to a basis for V by part ( b ) of Theorem 4.5.5. But this would mean that 
dim(V) > dim(W), which contradicts our hypothesis. Thus S must also be a basis for 
V, which means that W = V . The converse is obvious. 

Figure 4.5.2 illustrates the geometric relationship between the subspaces of R 3 in 
order of increasing dimension. 
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O PT I O N A L 


We conclude this section with optional proofs of Theorems 4.5.2, 4.5.3, and 4.5.5. 


Proof of Theorem 4.5.2(a) Let S' = {wi , w 2 , . . . , w m } be any set of m vectors in V. where 
m > n. We want to show that S' is linearly dependent. Since S = {vi, v 2 , . . . , v„} is a 
basis, each w,- can be expressed as a linear combination of the vectors in S, say 


Wi = anvi + a 2 iv 2 H b ci n iv„ 

W 2 = fli 2 vi + a 22 v 2 H b fl„ 2 v„ 

W m = T ‘ ‘ * T Q-nm^n 


( 1 ) 


To show that S' is linearly dependent, we must find scalars k \ , k 2 , ... ,k m , not all zero, 
such that 

£iwi + k 2 w 2 H b k m w m = 0 (2) 

We leave it for you to verify that the equations in ( 1 ) can be rewritten in the partitioned 
form 


[wi | w 2 


W,„] = [Vi | v 2 


v„] 


All 

a 2 1 

@m 1 

A 12 

a 22 

&m2 

®ln 

®2n 

u-mn 


( 3 ) 


Since m > n, the linear system 


flu a 2 i 

fli 2 a 22 

&ml 

&m2 


X\ 

X 2 

— 

1 

0 O 

1 

@ln &2n 

&mn 




_o_ 


( 4 ) 


has more equations than unknowns and hence has a nontrivial solution 
xi = k\, x 2 = k 2 , . . . , x m = k nt 


Creating a column vector from this solution and multiplying both sides of (3) on the 
right by this vector yields 


k\~ 


flu a 2 i 

&m\ 


k x 

k 2 

= [Vl | v 2 1 - - - 1 v„] 

fli 2 a 22 

&m2 


k 2 

L%J 


®ln ®2n 

&mn 


L _l 


By (4), this simplifies to 


k\~ 


i — 
o 

k 2 


0 




i 

??• 

s 

i 


1 

• o 
1 


which we can rewrite as 

k\ wi + k 2 w 2 H b k m v/ m = 0 


Since the scalar coefficients in this equation are not all zero, we have proved that 
5" = {wi, w 2 , . . . , w m } is linearly independent. 
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The proof of Theorem 4.5.2 (b) closely parallels that of Theorem 4.5.2(a) and will be 
omitted. 

Proof of Theorem 4.5.3(a) Assume that S = {vi, \ 2 , . . . , vy | is a linearly independent 
set of vectors in V, and v is a vector in V that is outside of span(S). To show that 
S' = {vi , v 2 , . ■ . , v r , v} is a linearly independent set, we must show that the only scalars 
that satisfy 

k\\i + k 2 x 2 H b k r x r + k r+i x — 0 (5) 

are k\ = k 2 = ■ ■ ■ = k r — k r+ \ = 0. But it must be true that k r+ \ = 0 for otherwise we 
could solve (5) for v as a linear combination of Vi, \ 2 , . . . , \ r , contradicting the assump- 
tion that v is outside of span (S). Thus, (5) simplifies to 


km + k 2 x 2 H b k r x r — 0 (6) 

which, by the linear independence of {vi, x 2 , . . . , x r ), implies that 

k\ = k 2 = ■ ■ ■ = k r = 0 


Proof of Theorem 4.5.3(b) Assume that S = {vi , x 2 , ■ ■ ■ , v r } is a set of vectors in V. and 
(to be specific) suppose that x r is a linear combination of Vi, x 2 , . . . , v r _i, say 


X r — Ci V! + C 2 V 2 + • • • + C r _iV r _i (7) 

We want to show that if x r is removed from S, then the remaining set of vectors 
{vi, x 2 , . v r _i} still spans S; that is, we must show that every vector w in span(S) 
is expressible as a linear combination of {vi, x 2 , , v r _i}. But if w is in span(S), then 
w is expressible in the form 


w = kixi + k 2 x 2 H b k r —\x r —\ + k r x r 


or, on substituting (7), 


w = ^ivi + k 2 x 2 H b k r _ iv r _! + k r (c\X\ + c 2 x 2 H b c r _iv r _i) 

which expresses w as a linear combination of v 1; v 2 , . . . , v r _i. 

Proof of Theorem 4.5.5(a) If S is a set of vectors that spans V but is not a basis for V, 
then S is a linearly dependent set. Thus some vector v in S is expressible as a linear 
combination of the other vectors in S. By the Plus/Minus Theorem (4.5.37?), we can 
remove v from S, and the resulting set S' will still span V. If S' is linearly independent, 
then S' is a basis for V , and we are done. If S' is linearly dependent, then we can remove 
some appropriate vector from S' to produce a set S " that still spans V. We can continue 
removing vectors in this way until we finally arrive at a set of vectors in S that is linearly 
independent and spans V. This subset of S is a basis for V. 

Proof of Theorem 4.5.5(b) Suppose that dim(V) = n. If S is a linearly independent set 
that is not already a basis for V, then S fails to span V , so there is some vector v in V 
that is not in span(5). By the Plus/Minus Theorem (4.5.3a), we can insert v into S, and 
the resulting set S' will still be linearly independent. If S' spans V, then S' is a basis for 
V, and we are finished. If S' does not span V, then we can insert an appropriate vector 
into S' to produce a set S" that is still linearly independent. We can continue inserting 
vectors in this way until we reach a set with n linearly independent vectors in V. This set 
will be a basis for V by Theorem 4.5.4. 
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Exercise Set 4.5 


In Exercises 1-6, find a basis for the solution space of the ho- 
mogeneous linear system, and find the dimension of that space. 

1. X\ + *2 — *3 =0 2. 3*] + X 2 + *3 “b *4 = 0 

— 2*1 — X 2 + 2* 3 = 0 5xi — x 2 + X} — X 4 = 0 

— X\ ~b *3 = 0 


3. 2*i -b -V '2 “b 3*3 — 0 
*i + 5* 3 = 0 

X 2 + X} = 0 


4. Xi — 4*2 + 3*3 — *4 = 0 

2*i — 8*2 + 6*3 — 2*4 = 0 


5. *1 — 3*2 +*3 = 0 6 . * + y + z = 0 

2*i — 6*2 + 2*3 = 0 3* + 2y — 2z = 0 

3*i — 9*2 + 3*3 = 0 4* + 3y — z = 0 

6 * + 5y + z — 0 

7. In each part, find a basis for the given subspace of R 3 , and 
state its dimension. 

(a) The plane 3x — 2y + 5z = 0. 


(b) The plane * — y = 0. 

(c) The line * = 2f, y = —t, z = 4?. 

(d) All vectors of the form (a, b, c ), where b = a + c. 


8 . In each part, find a basis for the given subspace of R 4 , and 
state its dimension. 

(a) All vectors of the form (a, b, c, 0). 

(b) All vectors of the form (a, b , c, d), where d = a + b and 
c = a — b. 

(c) All vectors of the form (a, b, c, d), where a = b = c = d. 

9. Find the dimension of each of the following vector spaces. 

(a) The vector space of all diagonal n x n matrices. 

(b) The vector space of all symmetric n x n matrices. 

(c) The vector space of all upper triangular n x n matrices. 

10. Find the dimension of the subspace of Pi consisting of all 
polynomials no + «i* + aix 2 + fl 3* 3 for which a 0 = 0 . 

11. (a) Show that the set IV of all polynomials in P 2 such that 

p(l) = 0 is a subspace of /A 

(b) Make a conjecture about the dimension of IV. 

(c) Confirm your conjecture by finding a basis for IV. 

12. Find a standard basis vector for R 1 that can be added to the 
set (vj, V 2 } to produce a basis for R 3 . 

(a) V] = (—1, 2, 3), v 2 = (1, —2, —2) 

(b) vi = (1,— 1,0), v 2 = (3, 1,-2) 

13. Find standard basis vectors for R 4 that can be added to the 
set {v! , v 2 ) to produce a basis for R 4 . 

vi = (1. —4, 2. —3), v 2 = (-3, 8 , -4, 6 ) 


14. Let { Vi , v 2 , v 3 } be a basis for a vector space V. Show that 
{ip, u 2. 113} is also a basis, where ip = Vi, Ui = Vi + v 2 , and 
U 3 = V! + v 2 + v 3 . 


15. The vectors vi = (1. —2, 3) and v 2 = (0, 5, —3) are linearly 
independent. Enlarge (vi, V 2 } to a basis for R ' . 

16. The vectors Vi = (1, 0, 0. 0) and v 2 = (1> 1, 0, 0) are linearly 
independent. Enlarge { vi , V 2 } to a basis for R 4 . 

17. Find a basis for the subspace of R 3 that is spanned by the 
vectors 


vi = ( 1 , 0 , 0 ), v 2 = ( 1 , 0 , 1 ), v 3 = ( 2 , 0 . 1 ), v 4 = ( 0 , 0 , — 1 ) 

18. Find a basis for the subspace of R 4 that is spanned by the 
vectors 

V! = (1, 1, 1, 1), v 2 = (2, 2, 2,0), v 3 = (0,0, 0,3), 
v 4 = (3, 3, 3, 4) 


19. In each part, let T A : R 3 — > R 3 be multiplication by A and find 
the dimension of the subspace of R 3 consisting of all vectors 
x for which T A (x) = 0. 



"l 

1 

o' 


'l 

2 

o' 

(a) A = 

1 

0 

1 

(b) A = 

1 

2 

0 


1 

0 

1 


1 

2 

0 


(c) A = 


1 

-1 

1 


0 0 
1 0 
1 1 


20. In each part, let T A be multiplication by A and find the dimen- 
sion of the subspace R 4 consisting of all vectors x for which 
T a (x) = 0. 


(a) A = 



0 

4 


1 

0 

0 

1 

f 

(b) A = 

-1 

1 

0 

0 

- 

1 

0 

0 

1 


Working with Proofs 


21. (a) Prove that for every positive integer n, one can find n + 1 
linearly independent vectors in F(— 00 , 00 ). [Hint: Look 
for polynomials.] 


(b) Use the result in part (a) to prove that F(— 00 , 00 ) is infinite- 
dimensional. 


(c) Prove that C(— 00 , 00 ), C'"(— 00 , 00 ), and C°°(— 00 , 00 ) are 
infinite-dimensional. 


22. Let 5 be a basis for an n -dimensional vector space V. Prove 
that if Vi , v 2 , • . • , v,. form a linearly independent set of vectors 
in V, then the coordinate vectors (vi)s, (V2)s, . . . , (v r )s form 
a linearly independent set in R" , and conversely. 
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23. Let 5 = {v 1; v 2 , . . . , v,-} be a nonempty set of vectors in an 
n -dimensional vector space V . Prove that if the vectors in 
5 span V , then the coordinate vectors (vj (V 2 )s, . . . , (v r )s 
span R " , and conversely. 

24. Prove part (a) of Theorem 4.5.6. 

25. Prove: A subspace of a finite-dimensional vector space is 
finite-dimensional. 

26. State the two parts of Theorem 4.5.2 in contrapositive form. 

27. In each part, let S be the standard basis for P 2 . Use the results 
proved in Exercises 22 and 23 to find a basis for the subspace 
of P 2 spanned by the given vectors. 

(a) — 1 + x — lx 1 , 3 + 3x 4- 6x 2 , 9 

(b) 1 + x, x 2 , 2 + 2x + 3x 2 

(c) 1 + x — 3x 2 , 2 + 2x — 6x 2 , 3 + 3x — 9x 2 

True-False Exercises 

TF. In parts (a)— ( k) determine whether the statement is true or 
false, and justify your answer. 

(a) The zero vector space has dimension zero. 

(b) There is a set of 17 linearly independent vectors in R' 1 . 

(c) There is a set of 1 1 vectors that span R' 1 . 

(d) Every linearly independent set of five vectors in R 5 is a basis 
for R 5 . 

(e) Every set of five vectors that spans R 5 is a basis for R 5 . 

(f ) Every set of vectors that spans R n contains a basis for R" . 


(g) Every linearly independent set of vectors in R" is contained in 
some basis for R" . 

(h) There is a basis for M 2 2 consisting of invertible matrices. 

(i) If A has size n x n and A, A 2 , . . . , A' r are distinct matri- 
ces, then {/„, A, A 2 , . . . , A"”} is a linearly dependent set. 

(j) There are at least two distinct three-dimensional subspaces 
of/V 

(k) There are only three distinct two-dimensional subspaces of /V 

Working with Technology 

Tl. Devise three different procedures for using your technology 
utility to determine the dimension of the subspace spanned by a 
set of vectors in R n , and then use each of those procedures to 
determine the dimension of the subspace of R 5 spanned by the 
vectors 

V! = (2, 2, -1, 0, 1), v 2 = (-1. -1, 2, -3, 1), 

v 3 = (1, 1,-2, 0,-1), v 4 = (0, 0, 1, 1, 1) 

T2. Find a basis for the row space of A by starting at the top and 
successively removing each row that is a linear combination of its 
predecessors. 


'3.4 

2.2 

1.0 

-1.8' 

2.1 

3.6 

4.0 

-3.4 

8.9 

8.0 

6.0 

7.0 

7.6 

9.4 

9.0 

-8.6 

1.0 

2.2 

0.0 

2.2 


4.6 Change of Basis 

A basis that is suitable for one problem may not be suitable for another, so it is a common 
process in the study of vector spaces to change from one basis to another. Because a basis is 
the vector space generalization of a coordinate system, changing bases is akin to changing 
coordinate axes in R 2 and R 2 . In this section we will study problems related to changing 
bases. 

Coordinate Maps If S = {vi, V 2 , . . . , v„J is a basis for a finite-dimensional vector space V, and if 

(v)s = (Cl, c 2 , . . . , c„) 

is the coordinate vector of v relative to S, then, as illustrated in Figure 4.4.6, the mapping 

v -* (v) s (1) 

creates a connection ( a one-to-one correspondence ) bet ween vectors in the general vector 
space V and vectors in the Euclidean vector space R“ . We call (1) the coordinate map 
relative to S from V to R n . In this section we will find it convenient to express coordinate 
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Coordinate map 



1 h 

~C\ 



C 2 



c n 


V R n 

▲ Figure 4.6.1 


Change of Basis 


vectors in the matrix form 


Cl 


Ms = 


C 2 


C 


n 


where the square brackets emphasize the matrix notation (Figure 4.6.1). 


( 2 ) 


There are many applications in which it is necessary to work with more than one coor- 
dinate system. In such cases it becomes important to know how the coordinates of a 
fixed vector relative to each coordinate system are related. This leads to the following 
problem. 


The Change-of-Basis Problem If v is a vector in a finite-dimensional vector space V, 
and if we change the basis for V from a basis B to a basis B' , how are the coordinate 
vectors [v] B and [v] s < related? 


Remark To solve this problem, it will be convenient to refer to B as the “old basis” and B' as 
the “new basis.” Thus, our objective is to find a relationship between the old and new coordinates 
of a fixed vector v in V. 


For simplicity, we will solve this problem for two-dimensional spaces. The solution 
for n -dimensional spaces is similar. Let 

B = {ui, u 2 ) and B' — ju', , u' 2 } 

be the old and new bases, respectively. We will need the coordinate vectors for the new 
basis vectors relative to the old basis. Suppose they are 


Kk = 


and [u 2 b = 


That is, 


Uj = flUl + 1?U2 

u 2 = cui + d »2 


Now let v be any vector in V, and let 


Mb' = 


be the new coordinate vector, so that 

v = k\ U[ + A:2U 2 


( 3 ) 


( 4 ) 


( 5 ) 


( 6 ) 


In order to find the old coordinates of v, we must express v in terms of the old basis B. 
To do this, we substitute (4) into (6). This yields 

v = £i(aui + bu2) + &2(cui + d 112) 


or 


v = {k\ci + &2c)ui + {k\b + k2d)u2 


Thus, the old coordinate vector for v is 


Mb = 


k\ a + k2C 
k\b + k 2 d 
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Transition Matrices 


which, by using (5), can be written as 


Mu = 


a c 


V 


a c 

b d_ 


_^2_ 


b d_ 


This equation states that the old coordinate vector [v]b results when we multiply the new 
coordinate vector [v]b' on the left by the matrix 


P = 


a 

b 


c 

d 


Since the columns of this matrix are the coordinates of the new basis vectors relative to 
the old basis [see (3)], we have the following solution of the change-of-basis problem. 


Solution of the Change-of-Basis Problem If we change the basis for a vector space V 
from an old basis B = [ui, u 2 , . . . , u„} to a new basis B' = [Uj, u^, . . . , u[,}, then for 
each vector v in V, the old coordinate vector [v]b is related to the new coordinate 
vector Mb' by the equation 

Mb = T[v]b' (7) 

where the columns of P are the coordinate vectors of the new basis vectors relative 
to the old basis; that is, the column vectors of P are 

[u \] B , K]b,..., [<]b (B) 


The matrix P in Equation (7) is called the transition matrix from B' to B . For emphasis, 
we will often denote it by Pb'^b- It follows from (8) that this matrix can be expressed 
in terms of its column vectors as 

Pb'->b = [KIb I [u' 2 ]« | | [<]*] (9) 

Similarly, the transition matrix from B to IT can be expressed in terms of its column 
vectors as 

Pb^b' = [[ui]b' I [“iIb' I I [U/Jb'] (10) 


Remark There is a simple way to remember both of these formulas using the terms “old basis” 
and “new basis” defined earlier in this section: In Formula (9) the old basis is B' and the new basis 
is B, whereas in Formula (10) the old basis is B and the new basis is B' . Thus, both formulas can 
be restated as follows: 

The columns of the transition matrix from an old basis to a new basis are the coordinate 
vectors of the old basis relative to the new basis. 


► EXAMPLE 1 Finding Transition Matrices 

Consider the bases B — [m, u 2 } and B' — [u'j, u 2 } for R 2 , where 

Ul = (l,0), U 2 = (0,l), U[ = (1, 1), u' = (2, 1) 

(a) Find the transition matrix ITr-, « from IT to IT 

(b) Find the transition matrix Pb-^b 1 from B to IT . 
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Invertibility of Transition 
Matrices 


Solution (a) Here the old basis vectors are u', and u' 2 and the new basis vectors are ui 
and U 2 . We want to find the coordinate matrices of the old basis vectors u', and u 2 relative 
to the new basis vectors ui and u 2 . To do this, observe that 


u'] = Ui + u 2 
u 2 = 2ui + u 2 


from which it follows that 


and hence that 


K] b = 


and [u 2 ] 


i 2 Jb 


Pb'^b 


'1 2 " 
1 1 


' 2 ' 

1 


Solution [b] Here the old basis vectors are m and u 2 and the new basis vectors are Uj 
and u 2 . As in part (a), we want to find the coordinate matrices of the old basis vectors 
Uj and u' 2 relative to the new basis vectors ui and u 2 . To do this, observe that 


ui = -Uj + u' 2 
u 2 = 2u'[ — u 2 

from which it follows that 


and hence that 





and [u 2 ] B / 



Pb^b 1 




◄ 


Suppose now that B and B’ are bases for a finite-dimensional vector space V. Since 
multiplication by Pb-^b maps coordinate vectors relative to the basis B' into coordinate 
vectors relative to a basis B , and /Y-, b- maps coordinate vectors relative to B into 
coordinate vectors relative to B', it follows that for every vector v in V we have 


Mb 


(ID 

Mb- 


(12) 


► EXAMPLE 2 Computing Coordinate Vectors 

Let B and B' be the bases in Example 1. Use an appropriate formula to find [v] B given 
that 

r— 3i 

Mb- = 5 

Solution To find [v] B we need to make the transition from IT to B. It follows from 
Formula ( 1 1) and part (a) of Example 1 that 


Mb = 


'1 

2 ' 

3 " 


" 7 " 

_1 

1 _ 

5 _ 


_ 2 _ 


◄ 


If B and B' are bases for a finite-dimensional vector space V, then 

(Pb-^b)(Pb^b-) = Pb^b 
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because multiplication by the product (Pb'->b)(Pb^b') first maps the 5-coordinates of a 
vector into its 5'-coordinates, and then maps those ^'-coordinates back into the original 
5-coordinates. Since the net effect of the two operations is to leave each coordinate vector 
unchanged, we are led to conclude that Pb->b must be the identity matrix, that is, 

(Pb'^bXPb^b’) = I (13) 


(we omit the formal proof). For example, for the transition matrices obtained in Example 
1 we have 


{Pb'^b)(Pb^b') 


'1 

2" 

'-1 

2' 


T 

O' 

1 

1 _ 

1 

-1 


_0 

1 _ 


It follows from (13) that Pb'^b is invertible and that its inverse is Pb^b 1 - Thus, we 
have the following theorem. 


IfP is the transition matrix from a basis B' to a basis B for a finite- 
dimensional vector space V, then P is invertible and 5 _1 is the transition matrix from 
B to B'. 


An Efficient Method for 
Computing Transition 
Matrices for R" 


Our next objective is to develop an efficient procedure for computing transition matrices 
between bases for R n . As illustrated in Example 1 , the first step in computing a transition 
matrix is to express each new basis vector as a linear combination of the old basis vectors. 
For R" this involves solving n linear systems of n equations in n unknowns, each of which 
has the same coefficient matrix (why?). An efficient way to do this is by the method 
illustrated in Example 2 of Section 1.6, which is as follows: 


A Procedure for Computing P b ^b' 

Step 1. Form the matrix [5' | 5], 

Step 2. Use elementary row operations to reduce the matrix in Step 1 to reduced row 
echelon form. 

Step 3. The resulting matrix will be [I \ Pb^b']- 

Step 4. Extract the matrix Pb->b’ from the right side of the matrix in Step 3. 


This procedure is captured in the following diagram. 

row operations 

[new basis | old basis] — > [I | transition from old to new] (14) 

► EXAMPLE 3 Example 1 Revisited 

In Example 1 we considered the bases 5 = {ui, U 2 ) and 5' = ( u, , uj] for R 2 , where 
m = (l,0), u 2 = (0, 1), ui = (1, 1), u 2 = (2, 1) 

(a) Use Formula (14) to find the transition matrix from 5' to 5. 

(b) Use Formula (14) to find the transition matrix from 5 to 5'. 


Solution (a) Here 5' is the old basis and 5 is the new basis, so 
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Transition to the Standard 
Basis for Ft n 


Since the left side is already the identity matrix, no reduction is needed. We see by 
inspection that the transition matrix is 


which agrees with the result in Example 1 . 

Solution ( b ) Here B is the old basis and B' is the new basis, so 


[new basis | old basis] = 


'1 

2 

1 

O' 

1 

1 

0 

1_ 


By reducing this matrix, so the left side becomes the identity, we obtain (verify) 


[I | transition from old to new] = 


"1 0 -1 2 ' 
0 1 1-1 


so the transition matrix is 


Pb^b 1 



2 ' 

-1 


which also agrees with the result in Example 1 . 


Note that in part (a) of the last example the column vectors of the matrix that made 
the transition from the basis B' to the standard basis turned out to be the vectors in B' 
written in column form. This illustrates the following general result. 


Let B' = [iii , U 2 , . . . , u„] be any basis for the vector space R" and 
let S = {e i, e 2 , . . . , e„] be the standard basis for R". If the vectors in these bases are 
written in column form, then 


P B '^s = [ui I u 2 | • • • | u„] 


(15) 


It follows from this theorem that if 


A = [m | u 2 1 • • ■ | u„] 


is any invertible n x n matrix, then A can be viewed as the transition matrix from the 
basis {m, U 2 , . . . , u„] for R" to the standard basis for R n . Thus, for example, the matrix 


A = 


1 2 
2 5 
1 0 


3 

3 

8 


which was shown to be invertible in Example 4 of Section 1.5, is the transition matrix 
from the basis 

m= (1,2,1), U 2 = (2,5,0), u 3 = (3, 3, 8) 


to the basis 


ei = (1,0,0), e 2 = (0,1,0), e 3 = (0, 0, 1) 
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Exercise Set 4.6 

1 . Consider the bases B = {u,, 112} and B' = {u' p u',} for R 2 , 
where 


" 2 " 

, U2 = 

4 " 

, u[ = 

Y 

, = 

"-l" 

2 

-1 


3 


-1 


(a) Find the transition matrix from B' to B. 

(b) Find the transition matrix from B to B' . 

(c) Compute the coordinate vector [w]a, where 



and use (12) to compute [w] 5 /. 

(d) Check your work by computing [w]s' directly. 


2. Repeat the directions of Exercise 1 with the same vector w but 
with 


Ul 


1 

0 


u 2 


0 

1 



3 . Consider the bases B = {ui, u 2 , 113} and B' = (u'p u 2 , 113} for 
R 3 , where 



' 2 ' 


2 


V 

Ui = 

1 

, u 2 = 

-1 

, u 3 = 

2 


1 


1 


1 



3' 


f 


'-f 

ui = 

1 

, u; = 

1 

. = 

0 


-5 


-3 


2 


(b) Find the transition matrix from B' = {gj, g,} to 

S = {fi,f 2 }. 

(c) Find the transition matrix from B to S'. 

(d) Compute the coordinate vector [h] s , where 

h = 2 sin x — 5 cos x, and use (12) to obtain [h] fl /. 

(e) Check your work by computing [h]s< directly. 

6. Consider the bases B = jp,,p 2 ! and S' = jq^qi) for Pi, 
where 

Pi = 6 + 3x, p 2 = 10 + 2x, qj = 2, q 2 = 3 + 2x 

(a) Find the transition matrix from S' to S. 

(b) Find the transition matrix from S to S'. 

(c) Compute the coordinate vector [p]s, where p = — 4 + x, 
and use (12) to compute [p] fl '. 

(d) Check your work by computing [p]^ directly. 

7. Let Bi = (ui, u 2 ) and S 2 = {vj , v 2 ) be the bases for R 2 in 
which Ui = (1, 2), u 2 = (2, 3), Vi = (1, 3), and v 2 = (1, 4). 

(a) Use Formula ( 14) to find the transition matrix Pb 2 ->B\ ■ 

(b) Use Formula ( 14) to find the transition matrix ■ 

(c) Confirm that Pb 2 ^b, and Pb,^b 2 are inverses of one 
another. 

(d) Letw = (0, 1). Find[w]B[ and then use the matrix Pb 2 ^b 2 
to compute [w] 5 , from [w] fil . 

(e) Letw = (2, 5). Find[w]a, and then use the matrix 
to compute [w] 5l from [w] fl2 . 


(a) Find the transition matrix S to S'. 

(b) Compute the coordinate vector [w]a, where 


-5 


w = 8 


-5 


and use (12) to compute [w]^/. 

(c) Check your work by computing [w]a' directly. 


4. Repeat the directions of Exercise 3 with the same vector w, but 
with 



"- 3 " 


'- 3 ' 


f 

Ui 

0 

, u 2 

2 

, u 3 

6 


-3 


-1 


-1 


'-6' 


'-2' 


~ —2 

" = 

-6 

• ui = 

-6 

■ ui = 

-3 


0 


4 


7 


5. Let V be the space spanned by f| = sin x and f 2 = cosx. 

(a) Show that gj = 2sinx 4- cosx and g 2 = 3 cosx form a 
basis for V. 


8. Let S be the standard basis for R 2 , and let B = (vi , v 2 ) be the 

basis in which vi = (2, 1) and v 2 = (—3, 4). 

(a) Find the transition matrix P B -*s by inspection. 

(b) Use Formula (14) to find the transition matrix Ps^b- 

(c) Confirm that Pb^s and Ps^b are inverses of one another. 

(d) Let w = (5,-3). Find [w] B and then use Formula (1 1) to 
compute [w]j. 

(e) Let w = (3, —5). Find [w]$ and then use Formula (12) to 
compute [w]fl. 

9. Let S be the standard basis for R 3 , and let 6 = {vj , v 2 , V3) 

be the basis in which vi = (1, 2, 1), v 2 = (2, 5, 0), and 

v 3 = (3, 3, 8). 

(a) Find the transition matrix P B ~>s by inspection. 

(b) Use Formula (14) to find the transition matrix Ps~*b- 

(c) Confirm that Pb^s and Ps->b are inverses of one another. 

(d) Let w = (5, —3, 1). Find [w] B and then use Formula (1 1) 
to compute [w] 5 . 

(e) Let w = (3, —5, 0). Find [w] 5 and then use Formula (12) 
to compute [w] 5 . 
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10. Let S' = {ei , e 2 } be the standard basis for B 2 , and let 

B = {vi , v 2 j be the basis that results when the vectors in 5 are 
reflected about the line y = x. 

(a) Find the transition matrix P B ^>s ■ 

(b) Let P = P B ^s and show that P T = P s ^ 5 . 

11. Let S = {ei , e 2 } be the standard basis for B 2 , and let 

B = {vi , v 2 j be the basis that results when the vectors in 5 are 
reflected about the line that makes an angle 6 with the positive 
x -axis. 

(a) Find the transition matrix P B ^$ ■ 

(b) Let P = Pb->s and show that P T — Ps^b- 

12. If Si, S 2 , and B3 are bases for R 2 , and if 


T (x \ , X 2 , x 3 ) = (jci + x 2 , 2*i — X 2 + 4x 3 , x 2 + 3x 3 ) 
is applied to each vector in 5. Find the transition matrix Pb^s- 

19. If [w] fi = w holds for all vectors w in R", what can you say 
about the basis 6? 

Working with Proofs 

20. Let 6 be a basis for R" . Prove that the vectors Vi, V2, . . . , v* 
span R" if and only if the vectors [vi] 5 , [v 2 ] B , . . . , [v t ] fi 
span R" . 

21. Let 6 be a basis for R". Prove that the vectors Vi, v 2 , . . . , v* 
form a linearly independent set in R" if and only if the vectors 
[Vila, [v 2 ] B , . . . , [v A .] B form a linearly independent set in R" . 



'3 

f 


'7 

2" 

5 s 

4 

to 

11 

_5 

2 

and Pb 2 ^b 3 = 

_4 

-1 


then P b ^bi = ■ 

13. If P is the transition matrix from a basis B' to a basis B, and 
Q is the transition matrix from B to a basis C, what is the 
transition matrix from B' to C? What is the transition matrix 
from C to B'? 


14. To write the coordinate vector for a vector, it is necessary to 
specify an order for the vectors in the basis. If P is the tran- 
sition matrix from a basis B' to a basis B, what is the effect 
on P if we reverse the order of vectors in B from vi , . . . , v„ to 
v„, , Vi? What is the effect on P if we reverse the order of 
vectors in both B' and B? 

15. Consider the matrix 


P = 


1 

1 

0 


1 0 
0 2 
2 1 


(a) P is the transition matrix from what basis B to the stan- 
dard basis S = {ei , e 2 , e 3 j for B 3 ? 

(b) P is the transition matrix from the standard basis 
S' = {e 3 , e 2 , e 3 } to what basis B for B 3 ? 


16. The matrix 



0 O' 

3 2 

1 1 


is the transition matrix from what basis B to the basis 
{(1, L 1), (L 1,0), (1,0,0)} for B 3 ? 

17. Let S = {ei , e 2 } be the standard basis for B 2 , and let 

B = {vj , v 2 ) be the basis that results when the linear transfor- 
mation defined by 

T(x 1, x 2 ) = (2xi + 3x 2 , 5 xi — x 2 ) 
is applied to each vector in 5. Find the transition matrix Pb^s- 


18. Let S = (ei,e 2 ,e 3 ) be the standard basis for B 3 , and let 
B = {vj , v 2 , v 3 } be the basis that results when the linear trans- 
formation defined by 


True-False Exercises 

TF. In parts (a)-(f ) determine whether the statement is true or 
false, and justify your answer. 

(a) If B\ and B 2 are bases for a vector space V , then there exists a 
transition matrix from B| to B 2 . 

(b) Transition matrices are invertible. 

(c) If B is a basis for a vector space B", then Pb-,b is the identity 
matrix. 


(d) If Pb i ~^b 2 is a diagonal matrix, then each vector in B 2 is a 
scalar multiple of some vector in B l . 

(e) If each vector in B 2 is a scalar multiple of some vector in B[, 
then Pb\—>b-) is a diagonal matrix. 

(f ) If A is a square matrix, then A = for some bases B 3 

and B 2 for R" . 

Working with Technology 

Tl. Let 

'5 8 6 -13' 

3-1 0-9 

P = 

0 1-1 0 
.2 4 3 -5. 

and 

v 3 = (2,4, 3, -5), v 2 = (0, 1, —1, 0), 
v 3 = (3,-1, 0,-9), v 4 = (5,8, 6, -13) 

Find a basis B = {u 3 , u 2 , u 3 , u 4 ) for B 4 for which P is the transi- 
tion matrix from B to B' = {v 3 , v 2 , v 3 , v 4 J. 

T2. Given that the matrix for a linear transformation T: B 4 — >• B 4 
relative to the standard basis B = {ei, e 2 , e 3 , e 4 ) for B 4 is 

'i 2 0 r 

3 0-12 

2 5 3 1 

.1213. 
find the matrix for T relative to the basis 


B' = {ei, ei + e 2 , ei + e 2 + e 3 , e 3 + e 2 + e 3 + e 4 ) 
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4.7 Row Space, Column Space, and Null Space 

In this section we will study some important vector spaces that are associated with matrices. 
Our work here will provide us with a deeper understanding of the relationships between the 
solutions of a linear system and properties of its coefficient matrix. 


Row Space, Column Space, 
and Null Space 


Recall that vectors can be written in comma-delimited form or in matrix form as either 
row vectors or column vectors. In this section we will use the latter two. 


DEFINITION 1 For an m x n matrix 


the vectors 



a it 

U\2 

Cl\n 

= 

fl21 

022 

&2n 


O-m 1 

Cm2 

& mn 

ri = 

[c\\ 

0\2 

&ln\ 

r 2 - 

1^21 

022 

@2 n\ 

r m — 

\Pm 1 

Om2 



in R" that are formed from the rows of A are called the row vectors of A, and the 
vectors 


d\\ 


<212 


^1 n 

a 2\ 

, C 2 = 

022 

, , c n — 

&2n 



Om2 


&mn 


in R m formed from the columns of A are called the column vectors of A. 


► EXAMPLE 1 Row and Column Vectors of a 2 x 3 Matrix 

Let 



The row vectors of A are 


rj = [2 1 0 ] and r 2 = [3 — 1 4 ] 


and the column vectors of A are 


Cl = 


' 2 ' 

3 


C2 



and C3 


O' 

4 


◄ 


The following definition defines three important vector spaces associated with a 
matrix. 


We will sometimes denote the 
row space of A, the column 
space of A, and the null space 
of A by row(A), col(A), and 
null(A), respectively. 


DEFINITION 2 If A is an m x n matrix, then the subspace of R " spanned by the 
row vectors of A is called the row space of A, and the subspace of R"‘ spanned by 
the column vectors of A is called the column space of A. The solution space of the 
homogeneous system of equations Ax = 0, which is a subspace of R n , is called the 
null space of A. 
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In this section and the next we will be concerned with two general questions: 

Question 1. What relationships exist among the solutions of a linear system Ax = b 
and the row space, column space, and null space of the coefficient matrix A? 
Question 2. What relationships exist among the row space, column space, and null 
space of a matrix? 

Starting with the first question, suppose that 


All 

dll 

fl-ln 


Xi 

a 2 \ 

a 22 

@2 n 

and x = 

x 2 

dm 1 

dm2 

@mn 


_X„_ 


It follows from Formula (10) of Section 1.3 that if Ci, c 2 , . . . , c„ denote the column 
vectors of A, then the product Ax can be expressed as a linear combination of these 
vectors with coefficients from x; that is, 

Ax = XiCi + x 2 c 2 H b x n c n (1) 

Thus, a linear system. Ax = b, of m equations in n unknowns can be written as 

xiCi + x 2 c 2 H b x„c„ = b (2) 

from which we conclude that Ax = b is consistent if and only if b is expressible as a linear 
combination of the column vectors of A. This yields the following theorem. 


A system of linear equations Ax = b is consistent if and only ifb is in 
the column space of A. 


► EXAMPLE 2 A Vector b in the Column Space of A 

Let Ax = b be the linear system 


"-1 3 2 


Xi 


f 

1 2 -3 


x 2 

= 

-9 

2 1 -2 


_*3_ 


-3 


Show that b is in the column space of A by expressing it as a linear combination of the 
column vectors of A. 

Solution Solving the system by Gaussian elimination yields (verify) 

X\ = 2, x 2 = — 1, X 3 — 3 
It follows from this and Formula (2) that 



-1 


3 


2 


1 

2 

1 

- 

2 

+ 3 

-3 

= 

-9 


2 


1 


-2 


-3 


Recall from Theorem 3.4.4 that the general solution of a consistent linear system 
Ax = b can be obtained by adding any specific solution of the system to the general 
solution of the corresponding homogeneous system Ax = 0. Keeping in mind that the 
null space of A is the same as the solution space of Ax = 0, we can rephrase that theorem 
in the following vector form. 
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.7.2 If x 0 is any solution of a consistent linear system Ax = b, and if 
S — {vi , \ 2 , . . ■ , yk] is a basis for the null space of A, then every solution of Ax = b can 
be expressed in the form 

X = x 0 + Civi + c 2 \ 2 H h C k y k (3) 

Conversely, for all choices of scalars c\, c 2 , ■ ■ ■ , c k , the vector x in this formula is a 
solution of Ax = b. 


The vector xo in Formula (3) is called a particular solution of Ax = b, and the remain- 
ing part of the formula is called the general solution of Ax = 0. With this terminology 
Theorem 4.7.2 can be rephrased as: 


The general solution of a consistent linear system can be expressed as the sum of a partic- 
ular solution of that system and the general solution of the corresponding homogeneous 
system. 


Geometrically, the solution set of Ax = b can be viewed as the translation by x 0 of the 
solution space of Ax = 0 (Figure 4.7.1). 



► EXAMPLE 3 General Solution of a Linear System Ax = b 

In the concluding subsection of Section 3.4 we compared solutions of the linear systems 


1 

3 

-2 

0 

2 

6 

-5 

-2 

0 

0 

5 

10 

2 

6 

0 

8 




Xi 

O' 


*2 

-3 


X 3 

15 


X 4 

18 


*5 



_*6_ 


0 

0 

0 

0 


and 


T 

2 

0 

2 



Xi 

-2 0 2 O' 


x 2 

-5 -2 4 -3 


x 3 

5 10 0 15 


X 4 

0 8 4 18 


*5 


_*6_ 


0 

-1 

5 

6 


and deduced that the general solution x of the nonhomogeneous system and the general 
solution x/ ; of the corresponding homogeneous system (when written in column-vector 
form) are related by 
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Bases for Row Spaces, 
Column Spaces, and Null 
Spaces 


X\ 

x 2 

*3 

XU 

*5 

_X(,_ 

X 


3 r — 4s — 2 1~ 


"0“ 


--T 


~-4~ 


~-T 

r 


0 


1 


0 


0 

—2s 


0 


0 


-2 


0 


= 


+ r 


+ s 


“b t 


s 


0 


0 


1 


0 

t 


0 


0 


0 


1 

l 

L 3 - 1 


1 

L 3 J 


0_ 


0_ 


0_ 


Xo Xh 


◄ 


Recall from the Remark following Example 3 of Section 4.5 that the vectors in x/, 
form a basis for the solution space of Ax = 0. 

We know that performing elementary row operations on the augmented matrix [A | b] 
of a linear system does not change the solution set of that system. This is true, in 
particular, if the system is homogeneous, in which case the augmented matrix is [A | 0]. 
But elementary row operations have no effect on the column of zeros, so it follows that 
the solution set of Ax = 0 is unaffected by performing elementary row operations on A 
itself. Thus, we have the following theorem. 


!EM 4.7.3 Elementary row operations do not change the null space of a matrix. 


The following theorem, whose proof is left as an exercise, is a companion to Theo- 
rem 4.7.3. 


IEIVI 4.7.4 Elementary row operations do not change the row space of a matrix. 


Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary 
row operations do not change the column space of a matrix. To see why this is not true, 
compare the matrices 


A = 


T 

2 


3' 

6 


and 



3' 

0 


The matrix B can be obtained from A by adding —2 times the first row to the second. 
However, this operation has changed the column space of A, since that column space 
consists of all scalar multiples of 

T 

2 


whereas the column space of B consists of all scalar multiples of 

T 

_ 0 _ 

and the two are different spaces. 


► EXAMPLE 4 Finding a Basis for the Null Space of a Matrix 

Find a basis for the null space of the matrix 

"1 3 -2 0 2 0" 

2 6 -5 -2 4 -3 

A = 

0 0 5 10 0 15 

2 6 0 8 4 IB 
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Solution The null space of A is the solution space of the homogeneous linear system 
Ax = 0, which, as shown in Example 3, has the basis 


-- 3 - 


-~ 4 ~ 


~- 2 ~ 

1 


0 


0 

0 


-2 


0 


, V 2 = 


, V 3 = 


0 


1 


0 

0 


0 


1 

0 _ 


0 _ 


0 _ 


Remark Observe that the basis vectors vi, vj, and V3 in the last example are the vectors that result 
by successively setting one of the parameters in the general solution equal to 1 and the others equal 
to 0. 


The following theorem makes it possible to find bases for the row and column spaces 
of a matrix in row echelon form by inspection. 


If a matrix R is in row echelon form, then the row vectors with the 
leading l’s ( the nonzero row vectors) form a basis for the row space of R, and the column 
vectors with the leading Ts of the row vectors form a basis for the column space of R. 


The proof essentially involves an analysis of the positions of the 0’s and l’s of R. We 
omit the details. 


EXAMPLE 5 Bases for the Row and Column Spaces of a Matrix in Row 
Echelon Form 

Find bases for the row and column spaces of the matrix 


1-2503 
0 13 0 0 
0 0 0 1 0 
0 0 0 0 0 


Solution Since the matrix R is in row echelon form, it follows from Theorem 4.7.5 that 
the vectors 


r t = [1 -2 5 0 3] 

r 2 = [0 1 3 0 0] 

r 3 = [0 0 0 1 0] 


form a basis for the row space of R, and the vectors 


T 


"-2" 


"o" 

0 


1 


0 


, C 2 = 


, C 4 = 


0 


0 


1 

0 


0 


0 


form a basis for the column space of R. 
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Basis for the Column 
Space of a Matrix 


Although elementary row op- 
erations can change the col- 
umn space of a matrix, it 
follows from Theorem 4.7.6 (b) 
that they do not change the 
dimension of its column space. 


► EXAMPLE 6 Basis for a Row Space by Row Reduction 

Find a basis for the row space of the matrix 

" 1 -3 4 -2 5 4" 

2-6 9-1 8 2 

2-6 9-1 9 7 

-1 3-4 2-5 -4 

Solution Since elementary row operations do not change the row space of a matrix, we 
can find a basis for the row space of A by finding a basis for the row space of any row 
echelon form of A. Reducing A to row echelon form, we obtain (verify) 

"1 -3 4 -2 5 4" 

0 0 1 3-2-6 

R — 

0 0 0 0 1 5 

_0 0 0 0 0 0 _ 

By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and 
hence form a basis for the row space of A. These basis vectors are 

r, = [1 -3 4 -2 5 4] 

r 2 = [0 0 1 3-2 -6] 

r 3 = [0 0 0 0 1 5] ◄ 


The problem of finding a basis for the column space of a matrix A in Example 6 is 
complicated by the fact that an elementary row operation can alter its column space. 
However, the good news is that elementary row operations do not alter dependence relation- 
ships among the column vectors. To make this more precise, suppose that wi , w 2 , . . . , w* 
are linearly dependent column vectors of A, so there are scalars c\, c 2 , . . . ,c k that are 
not all zero and such that 


ciwi + c 2 w 2 H b CkWk = 0 (4) 

If we perform an elementary row operation on A, then these vectors will be changed 
into new column vectors w\,W 2 , ■ ■ ■ ,W k . At first glance it would seem possible that the 
transformed vectors might be linearly independent. However, this is not so, since it can 
be proved that these new column vectors are linearly dependent and, in fact, related by 
an equation 

Ciw'i + C 2 w 2 H b c k W k = 0 

that has exactly the same coefficients as (4). It can also be proved that elementary row 
operations do not alter the linear independence of a set of column vectors. All of these 
results are summarized in the following theorem. 


!EM 4.7.6 If A and B are row equivalent matrices, them. 

(a) A given set of column vectors of A is linearly independent if and only if the corre- 
sponding column vectors of B are linearly independent. 

(b) A given set of column vectors of A forms a basis for the column space of A if and 
only if the corresponding column vectors of B form a basis for the column space 
of B. 
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► EXAMPLE 7 Basis for a Column Space by Row Reduction 

Find a basis for the column space of the matrix 

" 1 -3 4 -2 5 4" 

2-6 9-1 8 2 

2-6 9-1 9 7 

-1 3-4 2 -5 -4 


that consists of column vectors of A. 


Solution We observed in Example 6 that the matrix 

"1 -3 4 -2 5 4 

0 0 1 3-2-6 

R = 

0 0 0 0 1 5 

0 0 0 0 0 0 


is a row echelon form of A. Keeping in mind that A and R can have different column 
spaces, we cannot find a basis for the column space of A directly from the 
column vectors of R. However, it follows from Theorem 4. 7. 6(6) that if we can find 
a set of column vectors of R that forms a basis for the column space of R , then the 
corresponding column vectors of A will form a basis for the column space of A. 

Since the first, third, and fifth columns of R contain the leading l’s of the row vectors, 
the vectors 


■f 


"4" 


5" 

0 


1 


-2 

0 

. C 3 = 

0 

. C 5 = 

1 

0 


0 


0 


form a basis for the column space of R. Thus, the corresponding column vectors of A, 
which are 


r 


4" 


5" 

2 


9 


8 

2 

, c 3 = 

9 

, c 5 = 

9 

-1 


-4 


-5 


form a basis for the column space of A. 


Up to now we have focused on methods for finding bases associated with matrices. 
Those methods can readily be adapted to the more general problem of finding a basis 
for the subspace spanned by a set of vectors in R" . 

► EXAMPLE 8 Basis for the Space Spanned by a Set of Vectors 

The following vectors span a subspace of R 4 . Find a subset of these vectors that forms 
a basis of this subspace. 

vi = (1,2, 2,-1), v 2 = (-3, -6, -6, 3), 

v 3 = (4, 9, 9, —4), v 4 = ( 2, 1, 1, 2), 

v 5 = (5, 8, 9, -5), v 6 = (4, 2, 7, -4) 

Solution If we rewrite these vectors in column form and construct the matrix that has 
those vectors as its successive columns, then we obtain the matrix A in Example 7 (verify). 
Thus, 

spanfvi, v 2 , v 3 , v 4 , v 5 , v 6 } = col(A) 
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Bases Formed from Row 
and Column Vectors of a 
Matrix 


Proceeding as in that example (and adjusting the notation appropriately), we see that 
the vectors vi, v 3 , and V 5 form a basis for 

spanjvi, v 2 , v 3 , v 4 , v 5 , v 6 } 


In Example 6 , we found a basis for the row space of a matrix by reducing that matrix 
to row echelon form. However, the basis vectors produced by that method were not all 
row vectors of the original matrix. The following adaptation of the technique used in 
Example 7 shows how to find a basis for the row space of a matrix that consists entirely 
of row vectors of that matrix. 


► EXAMPLE 9 Basis for the Row Space of a Matrix 


Find a basis for the row space of 


1 -2 0 



0 5 15 
2 6 18 


consisting entirely of row vectors from A. 


0 

-2 

10 

8 


3 

6 

0 

6 


Solution We will transpose A, thereby converting the row space of A into the column 
space of A t ; then we will use the method of Example 7 to find a basis for the column 
space of A T ; and then we will transpose again to convert column vectors back to row 
vectors. 

Transposing A yields 


A t = 


1 

-2 

0 

0 

3 


2 

-5 

-3 

-2 

6 


0 

5 

15 

10 

0 


2 

6 

18 

8 

6 


and then reducing this matrix to row echelon form we obtain 

"1 2 0 2 " 

0 1 -5 -10 

0 0 0 1 

0 0 0 0 

0 0 0 0 


The first, second, and fourth columns contain the leading l’s, so the corresponding 
column vectors in A T form a basis for the column space of A T ; these are 


" r 


' 2" 


' 2" 

-2 


-5 


6 

0 

, C 2 = 

-3 

, and c 4 = 

18 

0 


-2 


8 

3_ 


6_ 


_ 6_ 


Transposing again and adjusting the notation appropriately yields the basis vectors 


r, = [1 -2 0 0 3], r 2 = [2 -5 -3 -2 6], 


r 4 = [2 6 18 8 6] 


for the row space of A. 
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Next we will give an example that adapts the method of Example 7 to solve the 
following general problem in R" : 


Problem Given a set of vectors S = {v 1; v 2 , . . . , vj.} in R n , find a subset of these 
vectors that forms a basis for spanfS), and express each vector that is not in that basis 
as a linear combination of the basis vectors. 


► EXAMPLE 10 Basis and Linear Combinations 

(a) Find a subset of the vectors 

V! = (1,-2, 0,3), v 2 = (2,-5, -3, 6), 
v 3 = (0, 1,3,0), v 4 = (2, -1,4, -7), v 5 = (5, -8, 1,2) 

that forms a basis for the subspace of R 4 spanned by these vectors. 

(b) Express each vector not in the basis as a linear combination of the basis vectors. 


Had we only been interested 
in part (a) of this example, it 
would have sufficed to reduce 
the matrix to row echelon 
form. It is for part (b) that 
the reduced row echelon form 
is most useful. 


I 1 T 'f' 1^ I 1 

Wi W 2 W3 W4 W5 

The leading l’s occur in columns 1, 2, and 4, so by Theorem 4.7.5, 

{wi,w 2 ,w 4 } 

is a basis for the column space of (6), and consequently, 

{Vl , V 2 ,V 4 } 

is a basis for the column space of (5). 

Solution [b) We will start by expressing w 3 and W5 as linear combinations of the basis 
vectors wi, w 2 , W4. The simplest way of doing this is to express W3 and W5 in terms 
of basis vectors with smaller subscripts. Accordingly, we will express w 3 as a linear 
combination of wq and w 2 , and we will express W5 as a linear combination of Wi, w 2 , 
and w 4 . By inspection of (6), these linear combinations are 


Solution (a) We begin by constructing a matrix that has Vi, v 2 , 


vectors: 


1 

-2 

0 

3 

t 

Vl 


2 

-5 

-3 

6 

t 

v 2 


0 

1 

3 

0 

t 

V3 


2 

-1 

4 

-7 

t 

V4 


1 

2 

t 

V5 


, V5 as its column 


(5) 


The first part of our problem can be solved by finding a basis for the column space of 
this matrix. Reducing the matrix to reduced row echelon form and denoting the column 
vectors of the resulting matrix by W| , w 2 , w 3 , w 4 , and W5 yields 

"1 0 2 0 1 " 

0 1-1 01 
0 0 0 1 1 

0 0 0 0 0 


(6) 


w 3 = 2wi — w 2 

W5 = Wi + w 2 + W4 
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We call these the dependency equations. The corresponding relationships in (5) are 

v 3 = 2vi - v 2 

V5 = Vi + V 2 + V4 

The following is a summary of the steps that we followed in our last example to solve 
the problem posed above. 

Basis for the Space Spanned by a Set of Vectors 

Step 1. Form the matrix A whose columns are the vectors in the set S = {vi, v 2 , . . . , v*;}. 

Step 2. Reduce the matrix A to reduced row echelon form R. 

Step 3. Denote the column vectors of R by wi , w 2 , . . . , w*. 

Step 4. Identify the columns of R that contain the leading l’s. The corresponding 
column vectors of A form a basis for span(S). 

This completes the first part of the problem. 

Step 5. Obtain a set of dependency equations for the column vectors Wj, w 2 , . . . , w* 
of R by successively expressing each w,- that does not contain a leading 1 of 
R as a linear combination of predecessors that do. 

Step 6. In each dependency equation obtained in Step 5, replace the vector w,- by the 
vector v; for i = 1, 2, . . . , k. 

This completes the second part of the problem. 


Exercise Set 4.7 


In Exercises 1-2, express the product Ax as a linear combina- 
tion of the column vectors of A. 


1. (a) 


' 2 
-1 



"4 0 -f 


"-2" 

(b) 

3 6 2 


3 


0-1 4 


5 


2. (a) 


-3 

5 

2 

1 


6 

-4 

3 

8 




In Exercises -4, determine whether b is in the column space 
of A, and if so, express b as a linear combination of the column 
vectors of A 



"l 

1 

2" 


"-f 

3 . (a) A = 

1 

0 

1 

; b = 

0 


2 

1 

3 


2 





L 



"l 

-1 

l" 


5 " 

(b) A = 

9 

3 

1 

; b = 

1 


1 

1 

1 


-1 


'1 

2 

0 

r 


"4" 

0 

1 

2 

1 

; b = 

3 

1 

2 

1 

3 


5 

0 

1 

2 

2 


7 


5 . Suppose that x\ = 3, = 0, jc 3 = — 1, x 4 = 5 is a solution of 

a nonhomogeneous linear system Ax = b and that the solu- 
tion set of the homogeneous system Ax = 0 is given by the 
formulas 


X\ = 5r — 2s, X 2 = s, Xj = s + t, X 4 = t 

(a) Find a vector form of the general solution of Ax = 0. 

(b) Find a vector form of the general solution of Ax = b. 

6. Suppose that X\ = — 1, Xi = 2, x 3 = 4, x 4 = — 3 is a solution 
of a nonhomogeneous linear system Ax = b and that the so- 
lution set of the homogeneous system Ax = 0 is given by the 
formulas 


X \ = —3 r + 4i, X2 = r — s , x 3 = r , x 4 = s 

(a) Find a vector form of the general solution of Ax = 0 . 

(b) Find a vector form of the general solution of Ax = b. 



"l-l l" 


" 2 " 

4 . (a) A = 

-1 1 -1 

; b = 

0 


-1 -1 1 


0 


In Exercises 7-8, find the vector form of the general solution 
of the linear system Ax = b, and then use that result to find the 
vector form of the general solution of Ax = 0 . 
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7. (a) Xi — 3x 2 = 1 
2x\ — 6 x 2 = 2 


8. (a) X\ — 2a- 2 “I - x 3 “I - 2*4 — 
2x[ — 4x 2 + 2x 3 + 4x 4 = 
— xi + 2 x 2 — X 3 — 2x 4 = 
3xi — 6 x 2 + 3x3 + 6 x 4 = 


(b) X] + x 2 + 2x 3 = 5 

X] -|- x 3 = — 2 
2xi + x 2 + 3x 3 = 3 

-1 

-2 

1 

-3 


(b) X\ + 2x 2 — 3x 3 + x 4 = 4 

— 2xi “I - x 2 “I - 2x 3 -|- x 4 = — 1 

— x i -|- 3x 2 — x 3 4- 2x 4 = 3 

4x! — 7x 2 — 5 x 4 = — 5 

In Exercises , find bases for the null space and row space 
of A. 


9. (a) A = 


10. (a) A = 


(b) A = 


1 -1 3 

5 -4 -4 (b) A = 

7-6 2 


14 5 2 

2 13 0 

-13 2 2 

1 4 5 6 9 

3-2 1 4-1 

-1 0 -1 -2 -1 

2 3 5 7 8 


0 -1 

0 -2 

0 







"l 

-3 

0 

o' 

1 

0 2 




0 

1 

0 

0 

0 

0 1 



(b) 

0 

0 

0 

0 

0 

0 0 













0 

0 

0 

0 

'1 

2 

4 

5" 











'1 

2 

-1 

5" 

0 

1 

-3 

0 


0 

1 

4 

3 

0 

0 

1 

-3 

(b) 

0 

0 

1 

-7 

0 

0 

0 

1 


0 

0 

0 

1 

_0 

0 

0 

0_ 







1 

-2 

5 

0 

3' 

-2 

5 

-7 

0 

-6 

-1 

3 

-2 

1 

-3 

-3 

8 

-9 

1 

-9 


In Exericses 16-17, find a subset of the given vectors that forms 
a basis for the space spanned by those vectors, and then express 
each vector that is not in the basis as a linear combination of the 
basis vectors. 

16. V! = (1,0, 1, 1). v 2 = (-3, 3,7, 1), 

v 3 = (-1, 3, 9, 3), v 4 = (-5, 3, 5, -1) 

17. vi = (1, -1, 5, 2), v 2 = (-2, 3. 1,0), 

v 3 = (4, -5, 9, 4), v 4 = (0, 4, 2, -3), 

v 5 = (-7, 18,2, -8) 

In Exercises 18-19, find a basis for the row space of A that 
consists entirely of row vectors of A. 

18. The matrix in Exercise 10(a). 

19. The matrix in Exercise 10(b). 

20. Construct a matrix whose null space consists of all linear 
combinations of the vectors 


0 


f 


2 


Vl = 

-1 

and v 2 = 

0 


3 

-2 



2 


4 


21. In each part, let A = 


. For the given vector b, 


In Exercises 1 12 , a matrix in row echelon form is given. By 
inspection, find a basis for the row space and for the column space 
of that matrix. 


11. (a) 


1 2 0 

1 -1 4 

find the general form of all vectors x in R 3 for which T A (x) = b 
if such vectors exist. 

(a) b = (0, 0) (b) b = (1,3) (c)b = (-l, 1) 

~2 O' 

0 1 
1 1 

2 0 _ 

the general form of all vectors x in R 1 for which T A (x) = b if 
such vectors exist. 


22. In each part, let A = 


For the given vector b, find 


12. (a) 


13. (a) Use the methods of Examples 6 and 7 to find bases for the 
row space and column space of the matrix 


A = 


(b) Use the method of Example 9 to find a basis for the row 
space of A that consists entirely of row vectors of A. 

In Exercises 1-15, find a basis for the subspace of R 4 that is 
spanned by the given vectors. 

14. (1, 1, -4, -3). (2, 0, 2, -2), (2, -1, 3, 2) 

15. (1, 1, 0, 0), (0, 0, 1, 1), (-2, 0, 2, 2), (0, -3, 0, 3) 


(a) b = ( 0 , 0 , 0 , 0 ) 
(c) b = ( 2 , 0 , 0 , 2 ) 
23. (a) Let 


A = 


(b) b= (1,1, -1,-1) 


0 1 0 
1 0 0 


_0 0 0 

Show that relative to an xyz-coordinate system in 3-space 
the null space of A consists of all points on the z-axis and 
that the column space consists of all points in the xy-plane 
(see the accompanying figure). 

(b) Find a 3 x 3 matrix whose null space is the x-axis and 
whose column space is the yz-plane. 

r 

Null space of A 


Column space 
of A 


◄ Figure Ex-23 
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24. Find a 3 x 3 matrix whose null space is 

(a) a point. (b) a line. (c) a plane. 


25. (a) Find all 2 x 2 matrices whose null space is the line 
3x — 5y = 0. 

(b) Describe the null spaces of the following matrices: 



"l 4 


’i o" 


6 2 


’o 

o" 

A = 

0 5 

, B = 

0 5 

, C = 

3 1 

, D = 

0 

0 


Working with Proofs 

26. Prove Theorem 4.7.4. 

27. Prove that the row vectors of an n x n invertible matrix A 
form a basis for R n . 

28. Suppose that A and B are n x n matrices and A is invertible. 
Invent and prove a theorem that describes how the row spaces 
of AB and B are related. 

True-False Exercises 

TF. In parts (a)-(j) determine whether the statement is true or 

false, and justify your answer. 

(a) The span of Vi, . . . , v„ is the column space of the matrix 

whose column vectors are Vi v„. 

(b) The column space of a matrix A is the set of solutions of 
Ax = b. 

(c) If R is the reduced row echelon form of A, then those column 
vectors of R that contain the leading l’s form a basis for the 
column space of A. 


(d) The set of nonzero row vectors of a matrix A is a basis for the 
row space of A. 

(e) If A and B are n x n matrices that have the same row space, 
then A and B have the same column space. 

(f ) If £ is an m x m elementary matrix and A is an m x n matrix, 
then the null space of EA is the same as the null space of A. 

(g) If E is an m x m elementary matrix and A is an m x n matrix, 
then the row space of EA is the same as the row space of A. 

(h) If E is an m x m elementary matrix and A is an m x n matrix, 
then the column space of EA is the same as the column space 
of A. 

(i) The system Ax = b is inconsistent if and only if b is not in the 
column space of A. 

(j) There is an invertible matrix A and a singular matrix B such 
that the row spaces of A and B are the same. 

Working with Technology 

Tl. Find a basis for the column space of 


'2 

6 

0 

8 

4 

12 

4' 

3 

9 

-2 

8 

6 

18 

6 

3 

9 

-7 

-2 

6 

-3 

-1 

2 

6 

5 

18 

4 

33 

11 

1 

3 

-2 

0 

2 

6 

2 


that consists of column vectors of A. 

T2. Find a basis for the row space of the matrix A in Exercise Tl 
that consists of row vectors of A. 


4.8 Rank, Nullity, and the Fundamental Matrix Spaces 

In the last section we investigated relationships between a system of linear equations and 
the row space, column space, and null space of its coefficient matrix. In this section we will 
be concerned with the dimensions of those spaces. The results we obtain will provide a 
deeper insight into the relationship between a linear system and its coefficient matrix. 


Row and Column Spaces In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the 

Have Equal Dimensions matrix 

" 1 -3 4 -2 5 4" 

2-6 9-1 8 2 

2-6 9-1 9 7 

-1 3-4 2 -5 -4 


both have three basis vectors and hence are both three-dimensional. The fact that these 
spaces have the same dimension is not accidental, but rather a consequence of the fol- 
lowing theorem. 
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REM 4.8.1 The row space and the column space of a matrix A have the same 
dimension. 


Proof It follows from Theorems 4.7.4 and 4.7.6 (ft) that elementary row operations do 
not change the dimension of the row space or of the column space of a matrix. Thus, if 
R is any row echelon form of A, it must be true that 


The proof of Theorem 4.8.1 
shows that the rank of A can 
be interpreted as the number 
of leading l’s in any row eche- 
lon form of A. 


dim(row space of A) = dim(row space of R) 
dim(column space of A) = dim(column space of R) 

so it suffices to show that the row and column spaces of R have the same dimension. But 
the dimension of the row space of R is the number of nonzero rows, and by Theorem 
4.7.5 the dimension of the column space of R is the number of leading l’s. Since these 
two numbers are the same, the row and column space have the same dimension. 


Rank and Nullity 


The dimensions of the row space, column space, and null space of a matrix are such 
important numbers that there is some notation and terminology associated with them. 


DEFINITION 1 The common dimension of the row space and column space of a 
matrix A is called the rank of A and is denoted by rank(A); the dimension of the null 
space of A is called the nullity of A and is denoted by nullity (A). 


EXAMPLE 1 Rank and Nullity of a 4 x 6 Matrix 

Find the rank and nullity of the matrix 

"-1 2 0 4 5 -3" 

3 -7 2 0 1 4 

“ 2 -5 2 4 6 1 

4 -9 2 -4 -4 7_ 

Solution The reduced row echelon form of A is 

”1 0 -4 -28 -37 13" 

0 1 -2 -12 -16 5 

oooooo (1) 

0 0 0 0 0 0 

(verify). Since this matrix has two leading l’s, its row and column spaces are two- 
dimensional and rank(A) = 2. To find the nullity of A, we must find the dimension of 
the solution space of the linear system Ax = 0. This system can be solved by reducing 
its augmented matrix to reduced row echelon form. The resulting matrix will be iden- 
tical to (1), except that it will have an additional last column of zeros, and hence the 
corresponding system of equations will be 


Xi — 4^3 ~ 28*4 — 37x5 + 13*6 = 0 
X2 — 2X3 — 12*4 — 16x5 + 5 x 6 = 0 


Solving these equations for the leading variables yields 


xi = 4x3 + 28 x 4 + 37x5 — 13x6 
X2 = 2.X3 + 12x4 + 16x5 — 5X6 


( 2 ) 
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from which we obtain the general solution 

x\ = 4 r + 28s + 37/ — 1 3w 
%2 = 2r + 12 s + 16/ — 5m 
X 3 = r 
X4 = s 
x 5 = t 
X(, = u 

or in column vector form 


Xl 


4 


28 


37 


-13 

x 2 


2 


12 


16 


-5 

X3 


1 


0 


0 


0 

= r 

0 

+ S 

1 

+ / 

0 

-j- U 

0 

X4 


X5 


0 


0 


1 


0 

x 6 


0 


0 


0 


1 


Because the four vectors on the right side of (3) form a basis for the solution space, 
nullity(A) = 4. 

► EXAMPLE 2 Maximum Value for Rank 

What is the maximum possible rank of an m x n matrix A that is not square? 

Solution Since the row vectors of A lie in R" and the column vectors in R m , the row 
space of A is at most n-dimensional and the column space is at most m-dimensional. 
Since the rank of A is the common dimension of its row and column space, it follows 
that the rank is at most the smaller of m and n. We denote this by writing 

rank(A) < min(» 2 , n) 

in which min(w, n) is the minimum of m and n. M 


The following theorem establishes a fundamental relationship between the rank and 
nullity of a matrix. 


Dimension Theorem for Matrices 

If A is a matrix with n columns, then 

rank(A) + nullity (A) = n (4) 

Proof Since A has n columns, the homogeneous linear system Ax = 0 has n unknowns 
(variables). These fall into two distinct categories: the leading variables and the free 
variables. Thus, 

number of leading 
variables 

But the number of leading variables is the same as the number of leading l’s in any row 
echelon form of A, which is the same as the dimension of the row space of A, which is 
the same as the rank of A. Also, the number of free variables in the general solution of 
Ax = 0 is the same as the number of parameters in that solution, which is the same as 
the dimension of the solution space of Ax = 0, which is the same as the nullity of A. 
This yields Formula (4). 


number of free 
variables 
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► EXAMPLE 3 The Sum of Rank and Nullity 

The matrix 


A = 


has 6 columns, so 

This is consistent with Example 1 , where we showed that 

rank(A) = 2 and nullity(A) = 4-4 


-1 2 

0 

4 

5 

-3 

3 -7 

2 

0 

1 

4 

2 -5 

2 

4 

6 

1 

4 -9 

2 

-4 

-4 

7 

rank(A) + nullity(A) 

= 6 



The following theorem, which summarizes results already obtained, interprets rank 
and nullity in the context of a homogeneous linear system. 


!EM 4.8.3 If A is an m x n matrix, then 

(a) rank(A) — the number of leading variables in the general solution of Ax = 0. 

(b) nullity (A) = the number of parameters in the general solution of Ax = 0. 


► EXAMPLE 4 Rank, Nullity, and Linear Systems 

(a) Find the number of parameters in the general solution of Ax = 0ifAisa5x7 
matrix of rank 3. 

(b) Find the rank of a 5 x 7 matrix A for which Ax = 0 has a two-dimensional solution 
space. 

Solution (a) From (4), 

nullity(A) = n — rank(A) = 7—3 = 4 
Thus, there are four parameters. 

Solution ( b ) The matrix A has nullity 2, so 

rank(A) = n — nullity(A) = 7 — 2=5 M 

Recall from Section 4.7 that if Ax = b is a consistent linear system, then its general 
solution can be expressed as the sum of a particular solution of this system and the general 
solution of Ax = 0. We leave it as an exercise for you to use this fact and Theorem 4.8.3 
to prove the following result. 


If Ax = b is a consistent linear system of m equations in n unknowns, 
and if A has rank r, then the general solution of the system contains n — r parameters. 


The Fundamental Spaces of 
a Matrix 


There are six important vector spaces associated with a matrix A and its transpose A T : 
row space of A row space of A T 

column space of A column space of A T 
null space of A null space of A 7 
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If A is an m x n matrix, then 
the row space and null space 
of A are subspaces of R " , and 
the column space of A and the 
null space of A T are subspaces 
of R m . 


A Geometric Link Between 
the Fundamental Spaces 


However, transposing a matrix converts row vectors into column vectors and conversely, 
so except for a difference in notation, the row space of A T is the same as the column 
space of A, and the column space of A T is the same as the row space of A. Thus, of the 
six spaces listed above, only the following four are distinct: 

row space of A column space of A 
null space of A null space of A T 

These are called the fundamental spaces of a matrix A. We will now consider how these 
four subspaces are related. 

Let us focus for a moment on the matrix A T . Since the row space and column space 
of a matrix have the same dimension, and since transposing a matrix converts its columns 
to rows and its rows to columns, the following result should not be surprising. 


THEOREM 4.8.5 If A is any matrix, then rank(A) = rank(A T ). 

Proof 

rank(A) = dim(row space of A) = dim(column space of A T ) — rank(A r ). 

This result has some important implications. For example, if A is an m x n matrix, 
then applying Formula (4) to the matrix A T and using the fact that this matrix has m 
columns yields 

rank(A r ) + nullity(A r ) = m 
which, by virtue of Theorem 4.8.5, can be rewritten as 

rank(A) + nullity(A r ) = m (5) 

This alternative form of Formula (4) makes it possible to express the dimensions of all 
four fundamental spaces in terms of the size and rank of A. Specifically, if rank(A) = r, 
then 


dim[row(A)] = r dim[col(A)] = r 
dim[null(A)] = n — r dim[null(A r )] = m — r 


The four formulas in (6) provide an algebraic relationship between the size of a matrix 
and the dimensions of its fundamental spaces. Our next objective is to find a geometric 
relationship between the fundamental spaces themselves. For this purpose recall from 
Theorem 3.4.3 that if A is an m x n matrix, then the null space of A consists of those 
vectors that are orthogonal to each of the row vectors of A. To develop that idea in more 
detail, we make the following definition. 


DEFINITION 2 If W is a subspace of R n , then the set of all vectors in R" that are 
orthogonal to every vector in W is called the orthogonal complement of W and is 
denoted by the symbol W L . 


The following theorem lists three basic properties of orthogonal complements. We 
will omit the formal proof because a more general version of this theorem will be proved 
later in the text. 
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Part ( b ) of Theorem 4.8.6 can If W is a subspace of R " , them 


be expressed as 

(a) 

W ± is a subspace of R". 

wnw 7 = (0) 

( b ) 

The only vector common to W and is 

and part (c) as 

(c) 

The orthogonal complement ofW ± is W. 

(VP 1 ) 1 - = W 




► EXAMPLES Orthogonal Complements 

In R~ the orthogonal complement of a line W through the origin is the line through the 
origin that is perpendicular to W (Figure 4.8.1a); and in R 3 the orthogonal complement 
of a plane W through the origin is the line through the origin that is perpendicular to 
that plane (Figure 4.8.1/?). 



Explain why (0) and R" are 

orthogonal complements. 

6 F ►Figure 4.8.1 


w 



The next theorem will provide a geometric link between the fundamental spaces of 
a matrix. In the exercises we will ask you to prove that if a vector in R" is orthogonal 
to each vector in a basis for a subspace of R n , then it is orthogonal to every vector in 
that subspace. Thus, part (a) of the following theorem is essentially a restatement of 
Theorem 3.4.3 in the language of orthogonal complements; it is illustrated in Example 6 
of Section 3.4. The proof of part (b), which is left as an exercise, follows from part (a). 
The essential idea of the theorem is illustrated in Figure 4.8.2. 




!EM 4.8.7 If A is an m x n matrix , them. 

(a) The null space of A and the row space of A are orthogonal complements in R n . 

(b) The null space of A 7 and the column space of A are orthogonal complements in R m . 
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More on the Equivalence 
Theorem 


Applications of Rank 


In Theorem 2.3.8 we listed six results that are equivalent to the invertibility of a square 
matrix A. We are now in a position to add ten more statements to that list to produce a 
single theorem that summarizes and links together all of the topics that we have covered 
thus far. We will prove some of the equivalences and leave others as exercises. 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

( d ) A is expressible as a product of elementary matrices. 

(i e ) Ax = b is consistent for every n x 1 matrix b. 

(/) Ax = b has exactly one solution for every n x 1 matrix b. 

(g) det(A) ^ 0. 

(h) The column vectors of A are linearly independent. 

O') The row vectors of A are linearly independent. 

( j ) The column vectors of A span R". 

( k ) The row vectors of A span R" . 

(/) The column vectors of A form a basis for R n . 

(m) The row vectors of A form a basis for R n . 

(«) A has rank n. 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of A is R". 

(q) The orthogonal complement of the row space of A is {0}. 

Proof The equivalence of (h) through (m) follows from Theorem 4.5.4 (we omit the 
details). To complete the proof we will show that ( b ), (n), and (o) are equivalent by 
proving the chain of implications ( b ) (o) =4> («) =4> (b). 

(b) => (o) If Ax = 0 has only the trivial solution, then there are no parameters in that 
solution, so nullity(A) = 0 by Theorem 4.8.3(b). 

(o) => (n) Theorem 4.8.2. 

(n) =y (b) If A has rank n, then Theorem 4. 8. 3(a) implies that there are n leading variables 
(hence no free variables) in the general solution of Ax = 0. This leaves the trivial solution 
as the only possibility. 


The advent of the Internet has stimulated research on finding efficient methods for trans- 
mitting large amounts of digital data over communications lines with limited bandwidths. 
Digital data are commonly stored in matrix form, and many techniques for improving 
transmission speed use the rank of a matrix in some way. Rank plays a role because it 
measures the “redundancy” in a matrix in the sense that if A is an m x n matrix of rank 
k, then n — k of the column vectors and m — k of the row vectors can be expressed in 
terms of k linearly independent column or row vectors. The essential idea in many data 
compression schemes is to approximate the original data set by a data set with smaller 
rank that conveys nearly the same information, then eliminate redundant vectors in the 
approximating set to speed up the transmission time. 
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Overdetermined and 
Underdetermined Systems 


In engineering and physics, 
the occurrence of an overde- 
termined or underdetermined 
linear system often signals that 
one or more variables were 
omitted in formulating the 
problem or that extraneous 
variables were included. This 
often leads to some kind of 
complication. 


In many applications the equations in a linear system correspond to physical constraints 
or conditions that must be satisfied. In general, the most desirable systems are those that 
have the same number of constraints as unknowns since such systems often have a unique 
solution. Unfortunately, it is not always possible to match the number of constraints and 
unknowns, so researchers are often faced with linear systems that have more constraints 
than unknowns, called overdetermined systems, or with fewer constraints than unknowns, 
called underdetermined systems. The following theorem will help us to analyze both 
overdetermined and underdetermined systems. 


Let A be an m x n matrix. 

(a) ( Overdetermined Case). If m > n . then the linear system Ax = b is inconsistent 
for at least one vector b in R” . 

(b) ( Underdetermined Case). If m < n, then for each vector b in R'" the linear system 
Ax = b is either inconsistent or has infinitely many solutions. 


Proof [a) Assume that m > n. in which case the column vectors of A cannot span R" 1 
(fewer vectors than the dimension of /?"'). Thus, there is at least one vector b in R' n that 
is not in the column space of A, and for any such b the system Ax = b is inconsistent by 
Theorem 4.7.1. 

Proof [b) Assume that in < n. For each vector b in R n there are two possibilities: either 
the system Ax = b is consistent or it is inconsistent. If it is inconsistent, then the proof 
is complete. If it is consistent, then Theorem 4.8.4 implies that the general solution has 
n — r parameters, where r = rank(A). But we know from Example 2 that rank(A) is at 
most the smaller of m and n (which is m), so 

n — r>n — m > 0 

This means that the general solution has at least one parameter and hence there are 
infinitely many solutions. 

► EXAMPLE 6 Overdetermined and Underdetermined Systems 

(a) What can you say about the solutions of an overdetermined system Ax = b of 7 
equations in 5 unknowns in which A has rank r — 4? 

(b) What can you say about the solutions of an underdetermined system Ax = b of 5 
equations in 7 unknowns in which A has rank r — 4? 

Solution (a) The system is consistent for some vector b in if . and for any such b the 
number of parameters in the general solution isn — r = 5 — 4=1. 

Solution ( b ) The system may be consistent or inconsistent, but if it is consistent for the 
vector b in R 5 , then the general solution has n — r = 7 — 4 = 3 parameters. 

► EXAMPLE 7 An Overdetermined System 

The linear system 

xi — 2x2 = b\ 
x\ - x 2 = b 2 
x\ + x 2 = b 2 
x\ + 2x 2 = £>4 
X\ + 3x 2 = b 5 

is overdetermined, so it cannot be consistent for all possible values of b\, b 2 , b 2 , b 4 , and 
£> 5 . Conditions under which the system is consistent can be obtained by solving the linear 
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system by Gauss- Jordan elimination. We leave it for you to show that the augmented 
matrix is row equivalent to 


'1 0 2b 2 - b x 

0 1 b 2 — b\ 

0 0 b 3 - 3b 2 + 2b\ (7) 

0 0 — Ab 2 + 37>i 

0 0 b 3 — 5b 2 + 4bx 

Thus, the system is consistent if and only if b\, b 2 , b 3 , b$, and b$ satisfy the conditions 


2 b\ — 3b 2 + b 3 — 0 

2>b\ — 4Z?2 4 “ b 4 = 0 

4 b\ — 5b 2 + b 3 = 0 


Solving this homogeneous linear system yields 

b\ = 5r — 4s, b 2 — 4r — 3s, b 3 = 2r — s, 7> 4 = r, b 5 = s 


where r and s are arbitrary. 


The coefficient matrix for the given linear system in the last example has n = 2 columns, 
and it has rank r = 2 because there are two nonzero rows in its reduced row echelon form. This 
implies that when the system is consistent its general solution will contain n — r = 0 parameters; 
that is, the solution will be unique. With a moment’s thought, you should be able to see that this 
is so from (7). 


Exercise Set 4.8 


In Exercises t— 2, find the rank and nullity of the matrix A by 
reducing it to row echelon form. 

2 -1 r 

4-2 2 

6-3 3 

8-4 4 


(a) By inspection of the matrix R , find the rank and nullity 
of A. 

(b) Confirm that the rank and nullity satisfy Formula (4). 

(c) Find the number of leading variables and the number 
of parameters in the general solution of Ax = 0 without 
solving the system. 


1. (a) A = 



1 

-2 

2 

3 

-l" 



2 

-1 

-3' 



"l 

0 o' 




(b) A = 

-3 

6 

-1 

1 

-7 


3. A = 

-1 

2 

-3 

; R 

= 

0 


0 





2 

-4 

5 

8 

-4 



1 

1 

4 



0 

0 1 





1 

0 

-2 

1 

O' 



2 

-1 

-3' 



"l 


0 

-3' 



2. (a) A = 

0 

-1 

-3 

1 

3 


4. A = 

-1 

2 

-3 

; R 

= 

0 


1 

-3 



-2 

-1 

1 

-1 

3 



1 

1 

-6 



0 


0 

0 




0 

1 

3 

0 

— 4_ 















' 1 

3 

1 

3“ 




2 

-1 

-3" 



"l 


1 

2 

3 - 

2 




0 

1 

1 

0 



5 . A = 

-2 

1 

3 

; R 

= 

0 


0 

0 



(b) A = 

-3 

0 

6 

-1 




-4 

2 

6 



0 


0 

0 




3 

4 

-2 

1 























0 

2 

2 

4' 




"1 

0 

-1 

o ' 


2 

0 

-4 

-2. 





















6. A = 

1 

0 

-1 

-3 


R = 


0 

1 

1 

0 


In Exercises 3-6, the matrix R is the reduced row echelon form 


2 

3 

1 

1 




0 

0 

0 

1 

of the matrix A. 







-2 

1 

3 

— 2 _ 




_0 

0 

0 

0 _ 
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7. In each part, find the largest possible value for the rank of A 
and the smallest possible value for the nullity of A. 

(a) A is 4 x 4 (b) A is 3 x 5 (c) A is 5 x 3 

8. If A is an m x n matrix, what is the largest possible value for 
its rank and the smallest possible value for its nullity? 

9. In each part, use the information in the table to: 

(i) find the dimensions of the row space of A, column space 
of A, null space of A, and null space of A T ; 

(ii) determine whether or not the linear system Ax = b is 
consistent; 

(iii) find the number of parameters in the general solution of 
each system in (ii) that is consistent. 



(a) 

(b) 

(c) 

(d) 

(e) 

(f) 

(g) 

Size of A 

3x3 

3x3 

3x3 

5x9 

5x9 

4x4 

6x2 

Rank(A) 

3 

2 

1 

2 

2 

0 

2 

Rank[A | b] 

3 

3 

1 

2 

3 

0 

2 


10. Verify that rank(A) = rank(A r ). 

1 2 4 o" 

A = -3 1 5 2 

-2 3 9 2 


15. Are there values of r and 5 for which 

'10 O' 

0 r — 2 2 

0 s — 1 r + 2 

0 0 3 

has rank 1? Has rank 2? If so, find those values. 

16. (a) Give an example of a 3 x 3 matrix whose column space is 

a plane through the origin in 3-space. 

(b) What kind of geometric object is the null space of your 
matrix? 

(c) What kind of geometric object is the row space of your 
matrix? 

17. Suppose that A is a 3 x 3 matrix whose null space is a line 
through the origin in 3-space. Can the row or column space 
of A also be a line through the origin? Explain. 

18. (a) If A is a 3 x 5 matrix, then the rank of A is at most 

Why? 

(b) If A is a 3 x 5 matrix, then the nullity of A is at most 
Why? 

(c) If A is a 3 x 5 matrix, then the rank of A r is at most 
Why? 

(d) If A is a 3 x 5 matrix, then the nullity of A 1 is at most 
Why? 


11. (a) Find an equation relating nullity(A) and nullity(A r ) for 

the matrix in Exercise 10. 

(b) Find an equation relating nullity(A) and nullity(A r ) for 
a general m x n matrix. 

12. Let T: R 2 — »■ 7? 3 be the linear transformation defined by the 
formula 

T (* 1 , x 2 ) = ( a'i + 3 * 2 , X\ — x 2 , X\) 

(a) Find the rank of the standard matrix for T . 

(b) Find the nullity of the standard matrix for T . 

13. Let T: R 5 — > R 3 be the linear transformation defined by the 
formula 

T (JCI, X 2 , X3, X4, xi) = (xi + X 2 , X 2 + Xi + X4, X4 + *5) 


19. (a) If A is a 3 x 5 matrix, then the number of leading l’s in 

the reduced row echelon form of A is at most 

Why? 

(b) If A is a 3 x 5 matrix, then the number of parameters in 

the general solution of Ax = 0 is at most Why? 

(c) If A is a 5 x 3 matrix, then the number of leading l’s in 

the reduced row echelon form of A is at most 

Why? 

(d) If A is a 5 x 3 matrix, then the number of parameters in 

the general solution of Ax = 0 is at most Why? 

20. Let A be a 7 x 6 matrix such that Ax = 0 has only the trivial 
solution. Find the rank and nullity of A. 

21. Let A be a 5 x 7 matrix with rank 4. 

(a) What is the dimension of the solution space of Ax = 0? 

(b) Is Ax = b consistent for all vectors b in R 5 ! Explain. 


(a) Find the rank of the standard matrix for T . 

(b) Find the nullity of the standard matrix for T . 

14. Discuss how the rank of A varies with t. 



fl 

i *1 


t 

3 -1~ 

(a) A = 

1 t 1 

(b) A = 

3 

6 -2 


L f 

i i 


-1 

-3 t 


22. Let 

on a\ 2 an 
«21 0 22 6<23 

Show that A has rank 2 if and only if one or more of the fol- 
lowing determinants is nonzero. 



On 

On 

an 

On 

On 

an 

Oil 

o 22 

a 2 i 

o-a 

a 22 

a 2 3 
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23. Use the result in Exercise 22 to show that the set of points 
(. x , y, z) in R 3 for which the matrix 

x y z 
1 x y 

has rank 1 is the curve with parametric equations x = t, 
y = t 2 ,z = t 3 . 


Working with Proofs 

29. Prove: If k ^ 0, then A and kA have the same rank. 

30. Prove: If a matrix A is not square, then either the row vectors 
or the column vectors of A are linearly dependent. 

31. Use Theorem 4.8.3 to prove Theorem 4.8.4. 


24. Find matrices A and B for which rank(A) = rank(B), but 
rank(A 2 ) ^ rank(B 2 ). 


25. In Example 6 of Section 3.4 we showed that the row space and 
the null space of the matrix 

'1 3 -2 0 2 O' 

2 6 -5 -2 4 -3 

A = 

0 0 5 10 0 15 

2 6 0 8 4 18 


are orthogonal complements in R 6 , as guaranteed by part (a) 
of Theorem 4.8.7. Show that null space of A T and the column 
space of A are orthogonal complements in R 4 , as guaranteed 
by part (b) of Theorem 4.8.7. [Suggestion: Show that each 
column vector of A is orthogonal to each vector in a basis for 
the null space of A 7 .] 


26. Confirm the results stated in Theorem 4.8.7 for the matrix. 


2 

-5 

8 

0 

-17' 

1 

3 

-5 

1 

5 

3 

11 

-19 

7 

1 

1 

7 

-13 

5 

-3 


32. Prove Theorem 4. 8. 1(b). 

33. Prove: If a vector v in R " is orthogonal to each vector in a 
basis for a subspace W of R n , then v is orthogonal to every 
vector in W. 

True-False Exercises 

TF. In parts (a)— ( j) determine whether the statement is true or 

false, and justify your answer. 

(a) Either the row vectors or the column vectors of a square matrix 
are linearly independent. 

(b) A matrix with linearly independent row vectors and linearly 
independent column vectors is square. 

(c) The nullity of a nonzero m x n matrix is at most m . 

(d) Adding one additional column to a matrix increases its rank 
by one. 

(e) The nullity of a square matrix with linearly dependent rows is 
at least one. 


27. In each part, state whether the system is overdetermined or 
underdetermined. If overdetermined, find all values of the b ’ s 
for which it is inconsistent, and if underdetermined, find all 
values of the b ' s for which it is inconsistent and all values for 
which it has infinitely many solutions. 



28. What conditions must be satisfied by b\, bi, hi, b 4, and 65 for 
the overdetermined linear system 


Xl 

- 

3x2 = 

Xl 

- 

2 x 2 = 

Xl 

+ 

X 2 = 

Xl 

- 

4x2 = 

Xl 

+ 

5x 2 = 


(f ) If A is square and Ax = b is inconsistent for some vector b, 
then the nullity of A is zero. 

(g) If a matrix A has more rows than columns, then the dimension 
of the row space is greater than the dimension of the column 
space. 

(h) If rank(A r ) = rank(A), then A is square. 

(i) There is no 3 x 3 matrix whose row space and null space are 
both lines in 3-space. 

(j) If V is a subspace of R" and W is a subspace of V, then W L 
is a subspace of V x . 

Working with Technology 

Tl. It can be proved that a nonzero matrix A has rank k if and 
only if some k x k submatrix has a nonzero determinant and all 
square submatrices of larger size have determinant zero. Use this 
fact to find the rank of 

-3 -1 3 2 5- 

5-3234 

A = 

1-3-5 0 -7 

.7-5 1 4 1_ 


to be consistent? 


Check your result by computing the rank of A in a different way. 
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T2. Sylvester’s inequality states that if A and B are n x n matrices 
with rank r A and r B , respectively, then the rank r AB of AB satisfies 
the inequality 

r A + r B - n < r AB < min(/- A , r B ) 


where min(r A , r B ) denotes the smaller of r A and r B or their com- 
mon value if the two ranks are the same. Use your technology 
utility to confirm this result for some matrices of your choice. 


4.9 Basic Matrix Transformations in R 2 and R 3 

In this section we will continue our study of linear transformations by considering some 
basic types of matrix transformations in R 2 and R 3 that have simple geometric 
interpretations. The transformations we will study here are important in such fields as 
computer graphics, engineering, and physics. 

There are many ways to transform the vector spaces R 2 and R 3 . some of the most 
important of which can be accomplished by matrix transformations using the methods 
introduced in Section 1.8. For example, rotations about the origin, reflections about 
lines and planes through the origin, and projections onto lines and planes through the 
origin can all be accomplished using a linear operator T A in which A is an appropriate 
2 x 2 or 3 x 3 matrix. 

Reflection Operators Some of the most basic matrix operators on R 2 and R 3 are those that map each point into 
its symmetric image about a fixed line or a fixed plane that contains the origin; these are 
called reflection operators. Table 1 shows the standard matrices for the reflections about 
the coordinate axes in R 2 , and Table 2 shows the standard matrices for the reflections 
about the coordinate planes in R 3 . In each case the standard matrix was obtained using 
the following procedure introduced in Section 1.8: Find the images of the standard basis 
vectors, convert those images to column vectors, and then use those column vectors as 
successive columns of the standard matrix. 


Table 1 


Operator 

Illustration 

Images of ei and e2 

Standard Matrix 

Reflection about 
the x-axis 

T{x,y) = (x, —y) 

T(x) - 

X 

/ 

, 

r(e 1 ) = ra,o) = (i,o) 

T(e 2 ) = T( 0, 1) = (0, — 1) 


T 0" 

.0 -1 


Reflection about 
the y-axis 

T(x,y) = (-x, y) 



i\x) 

’(x,y) 

X x 

T(fii) = T(1,0) = (-1,0) 
T(e 2 ) = T (0, 1) = (0, 1) 


"-1 O' 

o 1. 


Reflection about 
the line y — x 

T{x,y) = (v , x) 

T(x) 

y , , ,y =x 

O', x) / 

jsV 

x >(x,y) x 

J(ei) = 7X1,0) = (0, 1) 
r(e 2 ) = T(0, 1) = (1.0) 


"0 f 

1 0 
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Table 2 


Operator 

Illustration 

Images of ei , e 2 , e 3 

Standard Matrix 

Reflection about 
the xy-plane 

T(x, y, z ) = (x, y, - z ) 

x y^n 

k z 

X 

X) 

(X, y, z) 

y 

(x, y, -z) 

T( ei ) = T{\, 0, 0) = (1,0, 0) 
T(e 2 ) = T (0, 1,0) = (0, 1,0) 
T(e 3 ) = T (0. 0, 1) = (0, 0,-1) 


"l 0 o' 

0 1 0 

0 0-1 


Reflection about 
the xz-plane 

T(x, y, z) = (x, -y, z) 

(x, -y, z) 
u 

T(x) \ 

k z 

t (X, y, z ) 

X y 

T (ei) = T(l, 0, 0) = (1,0, 0) 

T (e 2 ) = T (0, 1,0) = (0, -1,0) 
T(e 3 ) = 7T0, 0, 1) = (0, 0, 1) 


'l 0 o' 

0-1 0 

0 0 1 


Reflection about 
the yz-plane 

T(x, y, z) = (-x, y, z) 


, z 

(-x, y, z) 

T(x) 

.^ y Hx,y,z) y 

X 

T(ei) = T(1,0, 0) = (-1,0, 0) 
T(e 2 ) = T (0, 1,0) = (0, 1,0) 

r(e 3 ) = no, o, i) = (o, o, i) 


'-1 0 o' 

0 1 0 

0 0 1 



Projection Operators Matrix operators on R 2 and R 3 that map each point into its orthogonal projection onto 
a fixed line or plane through the origin are called projection operators (or more precisely, 
orthogonal projection operators). Table 3 shows the standard matrices for the orthogonal 
projections onto the coordinate axes in R 2 , and Table 4 shows the standard matrices for 
the orthogonal projections onto the coordinate planes in R 3 . 


Table 3 
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Table 4 


Operator 

Illustration 

Images of ei , e2 , e3 

Standard Matrix 

Orthogonal projection 
onto the jcy-plane 

T(x, y, z) = ( x , y, 0) 


z 

x ^(x,y,z) 

- T l 

nxf \ 

(X, y. 0) 

r( ei ) = r(i,o, 0 ) = (i,o,o) 

r(e 2 ) = 7(0, 1,0) = (0, 1,0) 
r(e 3 ) = 7(0, 0, 1) = (0, 0, 0) 


"l 0 o’ 

0 1 0 

0 0 0 


Orthogonal projection 
onto the jcz-plane 

T{x, y, z ) = (x, 0, z ) 

(x,0,O[r- 

7'(x) 

Y 

z 

y 

7(ei) = 7(1,0, 0) = (1,0,0) 
7(e 2 ) = 7(0, 1,0) = (0,0,0) 
7(e 3 ) = 7(0, 0, 1) = (0, 0, 1) 


"l 0 o’ 

0 0 0 

0 0 1 


Orthogonal projection 
onto the yz-plane 

T(x, y, z) = (0 ,y,z) 

xY 

Y y - z) 

T(x)y/‘ 

sY \x, y z) 
x y 

7(d) = 7(1,0, 0) = (0,0,0) 
7(e 2 ) = 7(0, 1,0) = (0, 1,0) 
7(e 3 ) = 7(0, 0, 1) = (0, 0, 1) 


"o 0 o’ 

0 1 0 

0 0 1 



Rotation Operators Matrix operators on R 2 and R 3 that move points along arcs of circles centered at the 
origin are called rotation operators. Let us consider how to find the standard matrix for 
the rotation operator T: R 2 —> R 2 that moves points counterclockwise about the origin 
through a positive angle 6. As illustrated in Figure 4.9.1, the images of the standard 
basis vectors are 


r(ei) = 7(1, 0) = (cos$, sint?) and T(^ 2 ) — 7(0, 1) = (— sind, cost?) 


so it follows from Formula (14) of Section 1.8 that the standard matrix for T is 


A = [7 ( ei ) | T (e 2 )] = 


cos 6 
sin0 


— sin 9 
cos 0 



In keeping with common usage we will denote this operator by R e and call 


Rg = 


cos 6 
sin0 


— sin0 

COS0 


( 1 ) 
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In the plane, counterclockwise 
angles are positive and clock- 
wise angles are negative. The 
rotation matrix for a clockwise 
rotation of —9 radians can be 
obtained by replacing 8 by —9 
in ( 1 ). After simplification this 
yields 


R-e 


cos 9 sin 9 
— sin# cos 9 


the rotation matrix for R 1 . If x = (x. y) is a vector in R 2 . and if w = ( w | , uh ) is its 
image under the rotation, then the relationship w = Rgx can be written in component 
form as 

w i — x cos 0 — y sin0 

( 2 ) 

w 2 = x sin 6 + y cos 0 

These are called the rotation equations for R 2 . These ideas are summarized in Table 5. 


Table 5 


Operator 

Illustration 

Rotation Equations 

Standard Matrix 

Counterclockwise 

rotation about the 
origin through an 
angle 9 


• y lw 1 ,w 2 ) 

’/\ 

X A 

Wi = x cos 9 — y sin 9 
w 2 = x sin 9 T y cos 9 

cos 9 — sin 9 1 

sin 9 cos 9 J 




> EXAMPLE 1 A Rotation Operator 

Find the image of x = (1, 1) under a rotation of jt/6 radians (= 30°) about the origin. 


Solution It follows from (1) with 0 = it j 6 that 


"%/3 1 “ 

2 2 


T 


rV5-n 

2 


'0.37' 

1 V3 

L 2 2-1 


_ 1 _ 


1+V3 

1-2-1 


_X.11 _ 


or in comma-delimited notation, R n /e(X, 1) ~ (0.37, 1.37). J 


Rotations in R A rotation of vectors in R 3 is commonly described in relation to a line through the origin 
called the axis of rotation and a unit vector u along that line (Figure 4.9.2a). The unit 
vector and what is called the right-hand rule can be used to establish a sign for the angle of 
rotation by cupping the fingers of your right hand so they curl in the direction of rotation 
and observing the direction of your thumb. If your thumb points in the direction of u, 
then the angle of rotation is regarded to be positive relative to u, and if it points in the 
direction opposite to u, then it is regarded to be negative relative to u (Figure 4.9.2 b). 



(a) Angle of rotation 


(. b ) Right-hand rule 


For rotations about the coordinate axes in R 3 , we will take the unit vectors to be i, j, 
and k, in which case an angle of rotation will be positive if it is counterclockwise looking 
toward the origin along the positive coordinate axis and will be negative if it is clockwise. 
Table 6 shows the standard matrices for the rotation operators on R that rotate each 
vector about one of the coordinate axes through an angle 0 . You will find it instructive 
to compare these matrices to that in Table 5. 
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Table 6 


Operator 

Illustration 

Rotation Equations 

Standard Matrix 

Counterclockwise 

rotation about the 
positive x-axis through 
an angle 9 

/ 

w / 

X 

, z 

y 

U>1 = X 

in? = y cos 9 — z sin 9 
tt> 3 = y sin 9 + z cos 9 

‘l 0 0 1 

0 cos 9 — sin 9 

0 sin 9 cos 9 

Counterclockwise 

rotation about the 
positive y-axis through 
an angle 9 


, z 

* / ■- 



w 

W\ = x cos 9 + z sin 9 

w 2 = V 

u> 3 = —x sin 9 + z cos 9 

cos 9 0 sin 9 

0 1 0 

— sin 9 0 cos 9 

Counterclockwise 

rotation about the 
positive z-axis through 
an angle 9 

X 

-/ 

• 

w 

l y 

W\ = x cos 9 — y sin 9 
w 2 = x sin 9 + y cos 9 
w 3 = z 

cos 9 — sin 9 0 

sin0 cos 9 0 

0 0 1 


Yaw, Pitch, and Roll 

In aeronautics and astronautics, the orientation of an aircraft or 
space shuttle relative to an xyz-coordinate system is often described 
in terms of angles called yaw, pitch, and roll. If, for example, an 
aircraft is flying along the y-axis and the xy-plane defines the hori- 
zontal, then the aircraft's angle of rotation about the z-axis is called 
the . van’, its angle of rotation about the x-axis is called the pitch, and 
its angle of rotation about the y-axis is called the roll. A combi- 
nation of yaw, pitch, and roll can be achieved by a single rotation 
about some axis through the origin. This is, in fact, how a space 
shuttle makes attitude adjustments — it doesn't perform each rota- 
tion separately; it calculates one axis, and rotates about that axis 
to get the correct orientation. Such rotation maneuvers are used to 


align an antenna, point the nose toward a celestial object, or position 
a payload bay for docking. 



For completeness, we note that the standard matrix for a counterclockwise rotation 
through an angle 0 about an axis in R 3 , which is determined by an arbitrary unit vector 
u = (a, b,c) that has its initial point at the origin, is 


a 2 ( 1 — cos 0) + cosd 
ab( 1 — cos 6) + c sin0 
ac(l — cos0) — b sin0 


ab( 1 — cos0) — csin0 
b 2 ( 1 — cost?) + cos 9 
bc( 1 — cos0) + a sin0 


ac(l — cos 6) + b sin 0 
bc( 1 — cos 9) — a sin 9 
c 2 (l — cos0) + cos 9 


(3) 


The derivation can be found in the book Principles of Interactive Computer Graphics, by 
W. M. Newman and R. F. Sproull (New York: McGraw-FIill, 1979). You may find it 
instructive to derive the results in Table 6 as special cases of this more general result. 
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Dilations and Contractions If £ is a nonnegative scalar, then the operator 7’(x) = lex. on R 2 or R ' has the effect of 

increasing or decreasing the length of each vector by a factor of A:. If 0 < k < 1 the 
operator is called a contraction with factor k, and if k > 1 it is called a dilation with 
factor k (Figure 4.9.3). Tables 7 and 8 illustrate these operators. If k — 1, then T is the 
identity operator. 


► Figure 4.9.3 




Table 7 


Operator 

Illustration 

T(x,y) = ( kx,ky ) 

Effect on the 

Unit Square 

Standard 

Matrix 

Contraction with 

factor k in R~ 

(0 < k < 1) 


,y 

x >(x,y) 

T(x)ef(kx, ky) 

(0, 1) 

k 

(0, kb 

\ \ 

\ i 

o ?r* 

^ o 

L 1 




(1,0) (k, 0) 

Dilation with 
factor k in R 2 

(k > 1) 


iy T(x)^(kx,ky) 

x rix,y) 

x 

(0, If 

k 

(0, k) 

fc 




(1,0) (k, 0) 


Table 8 
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Expansions and In a dilation or contraction of R 2 or R 3 , all coordinates are multiplied by a nonnegative 
Compressions factor k. If only one coordinate is multiplied by k, then, depending on the value of k, 
the resulting operator is called a compression or expansion with factor k in the direction 
of a coordinate axis. This is illustrated in Table 9 for R 2 . The extension to R 3 is left as 
an exercise. 


Table 9 



Shears A matrix operator of the form T(x, y ) = (x + ky, y) translates a point (x, y) in the 
xy-plane parallel to the x-axis by an amount ky that is proportional to the y-coordinate 
of the point. This operator leaves the points on the x-axis fixed (since y = 0), but 
as we progress away from the x-axis, the translation distance increases. We call this 
operator the shear in the x-direction by a factor k. Similarly, a matrix operator of the 
form T (x, y) = (x, y + kx) is called the shear in the y-direction by a factor k. Table 10, 
which illustrates the basic information about shears in R 2 , shows that a shear is in the 
positive direction if k > 0 and the negative direction if k < 0. 
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► EXAMPLE 2 Effect of Matrix Operators on the Unit Square 

In each part, describe the matrix operator whose standard matrix is shown, and show 
its effect on the unit square. 



'1 2' 


'1 

-2' 


'2 O' 


'2 O' 

(a) Ai = 

0 1 

II 

fN 

0 

1 

(c) A 3 = 

0 2 

D- 

4^ 

II 

0 1 


Solution By comparing the forms of these matrices to those in Tables 7, 9, and 10, we 
see that the matrix A\ corresponds to a shear in the x -direction by a factor 2, the matrix 
A 2 corresponds to a shear in the x -direction by a factor —2, the matrix A 3 corresponds 
to a dilation with factor 2, and the matrix A 4 corresponds to an expansion in the x- 
direction with factor 2. The effects of these operators on the unit square are shown in 
Figure 4.9.4. 


► Figure 4.9.4 




A 


A 2 


A 3 


A 4 


Orthogonal Projections 
onto Lines Through the 
Origin 



In Table 3 we listed the standard matrices for the orthogonal projections onto the coordi- 
nate axes in R 2 . These are special cases of the more general matrix operator T A : A* 2 — > R 2 
that maps each point into its orthogonal projection onto a line L through the origin that 
makes an angle 6 with the positive x-axis (Figure 4.9.5). In Example 4 of Section 3.3 
we used Formula (10) of that section to find the orthogonal projections of the standard 
basis vectors for R 2 onto that line. Expressed in matrix form, we found those projections 
to be 


Hei) = 


cos 2 0 


'sin 0 cos 9 " 

_sin 9 cos 9 _ 

and T(e 3 ) = 


sin 6 


Thus, the standard matrix for T A is 



cos 2 9 

sin 9 cos 9 ' 


cos 2 9 | sin 29 

T (e 2 )j = 

sin 9 cos 9 

sin 2 9 

— 

2 

| sin 29 sin 2 9 


▲ Figure 4.9.5 
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We have included two versions 
of Formula (4) because both 
are commonly used. Whereas 
the first version involves only 
the angle 9 , the second in- 
volves both 9 and 29 . 


Reflections About Lines 
Through the Origin 



A Figure 4.9.6 



▲ Figure 4.9.7 


In keeping with common usage, we will denote this operator by 


cos 2 9 sin 9 cos 9' 


" cos 2 0 

\ sin 20" 

_sin 9 cos 9 sin 2 9 


_ 4 sin 29 

sin 2 0 


(4) 


► EXAMPLE 3 Orthogonal Projection onto a LineThrough the Origin 

Use Formula (4) to find the orthogonal projection of the vector x = (1,5) onto the line 
through the origin that makes an angle of n/6 (= 30°) with the positive x-axis. 


Solution Since sin(jr/6) = 1 /2 and cos(7r/6) = -s/3/2, it follows from (4) that the stan- 
dard matrix for this projection is 


cos 2 { it /6) sin(7r / 6) cos(7r /6)" 


1 

-WU> 

1 

_sin(7r/6) cos(tt/6) sin 2 (7r/6) 


73 1 

4 4-1 


Thus, 


1 

4=*|u> 

■HSi 

i 

T 


r 3+573"! 

4 


"2.91" 

73 1 

4 4-1 

_5_ 


73+5 

L 4 J 


L68_ 


or in comma-delimited notation, P n /(,{ 1, 5) ~ (2.91, 1.68). M 


In Table 1 we listed the reflections about the coordinate axes in R 2 . These are special cases 
of the more general operator H g : R 2 -+ R 2 that maps each point into its reflection about 
a line L through the origin that makes an angle 9 with the positive x-axis (Figure 4.9.6). 
We could find the standard matrix for H g by finding the images of the standard basis 
vectors, but instead we will take advantage of our work on orthogonal projections by 
using Formula (4) for P g to find a formula for Hg. 

You should be able to see from Figure 4.9.7 that for every vector x in R n 

P g x — x = \(H g x — x) or equivalently H g x — (2 P g — 7)x 

Thus, it follows from Theorem 1.8.4 that 

He =2 Pg-I 

and hence from (4) that 

cos 29 sin 29 

Hg = 

sin 29 — cos 29 


(5) 

( 6 ) 


► EXAMPLE 4 Reflection About a LineThrough the Origin 

Find the reflection of the vector x = (1,5) about the line through the origin that makes 
an angle of jt/6 (— 30°) with the x-axis. 


Solution Since sin(jr/3) = -s/3/2 and cos(7r/3) = 1/2, it follows from (6) that the stan- 
dard matrix for this reflection is 


"cos(7r/3) 

sin(7r/3) " 


■ 1 73 “ 

2 2 

_sin(jr/3) 

— cos(7r/3)_ 


73 1 

2 2-1 


Thus, 


1 73 “ 

2 2 

T 


r i+573"| 
2 


' 4.83" 

1 

^l<N 

1 

*>h 

1 

_5_ 


73-5 

L 2 -1 


_ — 1 .63 


or in comma-delimited notation, H, r /g(l, 5) ~ (4.83, —1.63). 
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Exercise Set 4.9 


1. Use matrix multiplication to find the reflection of (—1,2) 
about the 

(a) x-axis. (b) y-axis. (c) line y = x. 

2. Use matrix multiplication to find the reflection of (a, b) about 
the 

(a) x-axis. (b) y-axis. (c) line y = x. 

3. Use matrix multiplication to find the reflection of (2, —5, 3) 
about the 

(a) xy-plane. (b) xz-plane. (c) yz-plane. 

4. Use matrix multiplication to find the reflection of (a, b, c) 
about the 

(a) xy-plane. (b) xz-plane. (c) yz-plane. 

5. Use matrix multiplication to find the orthogonal projection of 
(2, —5) onto the 

(a) x-axis. (b) y-axis. 


(c) 45° counterclockwise about the positive y-axis. 

(d) 90° clockwise about the positive z-axis. 

13. (a) Use matrix multiplication to find the contraction of 

(—1,2) with factor k = 

(b) Use matrix multiplication to find the dilation of (—1, 2) 
with factor k = 3. 

14. (a) Use matrix multiplication to find the contraction of (a, b) 

with factor k = 1/cr, where a > 1 . 

(b) Use matrix multiplication to find the dilation of (a, b) with 
factor k = a, where a > 1 . 

15. (a) Use matrix multiplication to find the contraction of 

(2, — 1 , 3) with factor k = | . 

(b) Use matrix multiplication to find the dilation of (2, — 1 , 3) 
with factor k = 2. 


6. Use matrix multiplication to find the orthogonal projection of 
(a, b) onto the 

(a) x-axis. (b) y-axis. 

7. Use matrix multiplication to find the orthogonal projection of 
(—2, 1, 3) onto the 

(a) xy-plane. (b) xz-plane. (c) yz-plane. 

8. Use matrix multiplication to find the orthogonal projection of 
(a, b, c) onto the 

(a) xy-plane. (b) xz-plane. (c) yz-plane. 

9. Use matrix multiplication to find the image of the vector 
(3, —4) when it is rotated about the origin through an angle 
of 

(a) 9 = 30°. (b) 9 = -60°. 

(c) 9 = 45°. (d) 9 = 90°. 

10. Use matrix multiplication to find the image of the nonzero 
vector v = (t>! , u 2 ) when it is rotated about the origin through 

(a) a positive angle a. (b) a negative angle —a. 

11. Use matrix multiplication to find the image of the vector 
(2,-1, 2) if it is rotated 

(a) 30° clockwise about the positive x-axis. 

(b) 30° counterclockwise about the positive y-axis. 

(c) 45° clockwise about the positive y-axis. 


16. (a) Use matrix multiplication to find the contraction of 

(a, b, c) with factor k = 1/a, where a > 1 . 

(b) Use matrix multiplication to find the dilation of (a, b, c ) 
with factor k = a, where a > 1. 

17. (a) Use matrix multiplication to find the compression of 

(—1,2) in the x-direction with factor k — 

(b) Use matrix multiplication to find the compression of 
(—1,2) in the y-direction with factor k = 

18. (a) Use matrix multiplication to find the expansion of (— 1 , 2) 

in the x-direction with factor k = 3. 

(b) Use matrix multiplication to find the expansion of (— 1 , 2) 
in the y-direction with factor k = 3. 

19. (a) Use matrix multiplication to find the compression of (a, b) 

in the x-direction with factor k = 1/a, where a > 1. 

(b) Use matrix multiplication to find the expansion of ( a , b) 
in the y-direction with factor k = a, where a > 1. 

20. Based on Table 9, make a conjecture about the standard ma- 
trices for the compressions with factor k in the directions of 
the coordinate axes in R 3 . 

Exercises 21-22 Using Example 2 as a model, describe the ma- 
trix operator whose standard matrix is given, and then show in a 
coordinate system its effect on the unit square. 


(d) 90° counterclockwise about the positive z-axis. 

12. Use matrix multiplication to find the image of the vector 
(2, — 1 , 2) if it is rotated 

(a) 30° counterclockwise about the positive x-axis. 

(b) 30° clockwise about the positive y-axis. 


21. (a) A, = 

1 o' 
_° L 

(b) A 2 = 

’l o" 

.0 l 

(c) A 3 = 

’l o" 

j 1_ 

(d) a 4 = 

1 0 

.-5 1 
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22. (a) Ai = 


(c) A 3 = 


"3 

o’ 

(b) A 2 = 

’l 0 " 

0 

3 

0 3 



L - 1 




1 

0 

(d) A 4 = 

1 0 

3 

1 

-3 1 


In each part of Exercises 5-24, the effect of some matrix op- 
erator on the unit square is shown. Find the standard matrix for 
an operator with that effect. 




In Exercises 5-26, find the standard matrix for the orthogonal 
projection of R 2 onto the stated line, and then use that matrix to 
find the orthogonal projection of the given point onto that line. 

25. The orthogonal projection of (3, 4) onto the line that makes 
an angle of Jt/3 {— 60°) with the positive x-axis. 

26. The orthogonal projection of (1,2) onto the line that makes 
an angle of ir/4 (= 45°) with the positive x-axis. 

In Exercises 27-28, find the standard matrix for the reflection 
of R 1 about the stated line, and then use that matrix to find the 
reflection of the given point about that line. 

27. The reflection of (3, 4) about the line that makes an angle of 
jt/ 3 (= 60°) with the positive x-axis. 

28. The reflection of (1,2) about the line that makes an angle of 
jr/4 (= 45°) with the positive x-axis. 

29. For each reflection operator in Table 2 use the standard matrix 
to compute T(l, 2, 3), and convince yourself that your result 
makes sense geometrically. 

30. For each orthogonal projection operator in Table 4 use the 
standard matrix to compute T(l, 2, 3), and convince yourself 
that your result makes sense geometrically. 

31. Find the standard matrix for the operator T . R 3 —> R 3 that 

(a) rotates each vector 30° counterclockwise about the z-axis 
(looking along the positive z-axis toward the origin). 

(b) rotates each vector 45° counterclockwise about the x-axis 
(looking along the positive x-axis toward the origin). 

(c) rotates each vector 90° counterclockwise about the y-axis 
(looking along the positive y-axis toward the origin). 


32. In each part of the accompanying figure, find the standard 
matrix for the pictured operator. 



33. Use Formula (3) to find the standard matrix for a rotation 
of 180° about the axis determined by the vector v = (2, 2, 1). 
[Note: Formula (3) requires that the vector defining the axis 
of rotation have length 1 .] 

34. Use Formula (3) to find the standard matrix for a rotation 
of 7r/2 radians about the axis determined by v = (1, 1, 1). 
[Note: Formula (3) requires that the vector defining the axis 
of rotation have length 1 .] 

35. Use Formula (3) to derive the standard matrices for the rota- 
tions about the x-axis, the y-axis, and the z-axis through an 
angle of 90° in R 3 . 

36. Show that the standard matrices listed in Tables 1 and 3 are 
special cases of Formulas (4) and (6). 


37. In a sentence, describe the geometric effect of multiplying a 
vector x by the matrix 

cos 2 9 — sin 2 9 — 2 sin 0 cos 0 

2 sin 9 cos 9 cos 2 9 — sin 2 9 


A = 


38. If multiplication by A rotates a vector x in the xy-plane 
through an angle 9, what is the effect of multiplying x by A r ? 
Explain your reasoning. 

39. Let xo be a nonzero column vector in R 2 , and suppose that 
T: R 2 — > R 2 is the transformation defined by the formula 
T (x) = xo + Rgx, where Rg is the standard matrix of the ro- 
tation of R 2 about the origin through the angle 9. Give a 
geometric description of this transformation. Is it a matrix 
transformation? Explain. 

40. In R 3 the orthogonal projections onto the x-axis, y-axis, and 
z-axis are 

7i(x, y, z) = (x, 0, 0), r 2 (x, y, z) = (0, y, 0), 
r 3 (x,y,z) = (0,0, z) 

respectively. 

(a) Show that the orthogonal projections onto the coordinate 
axes are matrix operators, and then find their standard 
matrices. 
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(b) Show that if T: R 3 — > R 3 is an orthogonal projection onto (c) Make a sketch showing x and x — T(x) in the case where 
one of the coordinate axes, then for every vector x in R 3 , T is the orthogonal projection onto the x-axis. 

the vectors T(x) and x — T(x ) are orthogonal. 


4.10 Properties of MatrixTransformations 

In this section we will discuss properties of matrix transformations. We will show, for 
example, that if several matrix transformations are performed in succession, then the same 
result can be obtained by a single matrix transformation that is chosen appropriately. We 
will also explore the relationship between the invertibility of a matrix and properties of the 
corresponding transformation. 

Compositions of Matrix Suppose that T A is a matrix transformation from R" to R k and T B is a matrix transforma- 
Transformations tion from R k to R m . If x is a vector in R n , then T A maps this vector into a vector T A (x) 
in R k , and T B , in turn, maps that vector into the vector T B (T A (x)) in R m . This process 
creates a transformation from R" to R"‘ that we call the composition of 7) f with T A and 
denote by the symbol 

T b ° T a 

which is read “T B circle T A .” As illustrated in Figure 4.10.1, the transformation T A in 
the formula is performed first; that is, 

(T b oT a )(x) = T b (T a (x)) (1) 

This composition is itself a matrix transformation since 

(7b o T a )(x) = T B (T A (x)) = B(T a (x)) = B(Ax ) = (BA)x 
which shows that it is multiplication by BA. This is expressed by the formula 


T b oT a = T ba (2) 

t a t b 

r" x 


► Figure 4.10.1 t b° t a 

Compositions can be defined for any finite succession of matrix transformations 
whose domains and ranges have the appropriate dimensions. For example, to extend 
Formula (2) to three factors, consider the matrix transformations 

T a : R n -> R k , T b : R k R 1 , T c : R l R m 

We define the composition ( T c o 1 H o T A ) : R" — >• R m by 

(T c o T b o T a )(x) = T c (T B (T A (x))) 

As above, it can be shown that this is a matrix transformation whose standard matrix is 
CBA and that 

Tc oT b oT a = T cba (3) 

Sometimes we will want to refer to the standard matrix for a matrix transformation 
T : R" — > R m without giving a name to the matrix itself. In such cases we will denote the 
standard matrix for T by the symbol [T], Thus, the equation 

T (x) = [T]x 
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WARNING Just as it is not gen- 
erally true for matrices that 
AB = BA, so it is not gener- 
ally true that 

Tb ° T A = T A o T b 

That is, order matters when ma- 
trix transformations are com- 
posed. In those special cases 
where the order does not mat- 
ter we say that the linear trans- 
formations commute. 



▲ Figure 4.10.3 


states that T (x) is the product of the standard matrix [T] and the column vector x. For 
example, if T\ \R n -± R k and if 73: R k —> R m , then Formula (2) can be restated as 

[T 2 oT l ] = [T 2 ][T l ] (4) 

Similarly, Formula (3) can be restated as 

[T 3 oT 2 o 7i] = [r 3 ][r 2 ][7i] (5) 


► EXAMPLE 1 Composition Is Not Commutative 

Let 7j: R 2 — »■ R 2 be the reflection about the line y — x, and let T 2 : R 2 — > R 2 be the or- 
thogonal projection onto the y-axis. Figure 4.10.2 illustrates graphically that 7j o 73 
and 73 o 7j have different effects on a vector x. This same conclusion can be reached by 
showing that the standard matrices for 7j and T 2 do not commute: 




'0 

r 

'0 

O' 


'0 

r 

[7j o 73] 

= [Ti][T 2 ] = 

1 

o_ 

_0 

1_ 

— 

_0 

o_ 



'0 

O' 

'0 

r 


'0 

O' 

[T 2 o 7i] 

= [T 2 ][Ti] = 

_0 

1_ 

_1 

0 _ 

— 

_1 

o_ 


so [73 o 7j] + [7j o 73], 




► EXAMPLE 2 Composition of Rotations Is Commutative 

Let 7j: R 2 —> R 2 and 73: A* 2 — > R 2 be the matrix operators that rotate vectors about the 
origin through the angles 6\ and 0 2 , respectively. Thus the operation 

(73 o 7j)(x) = 73(7] (x)) 


first rotates x through the angle 9\, then rotates 7] (x) through the angle 0 2 . It follows 
that the net effect of T 2 o 7] is to rotate each vector in R 2 through the angle 0\ + d 2 
(Figure 4.10.3). The standard matrices for these matrix operators, which are 


[7f] = 


'cos 01 

— sin 0i ' 

, [T 2 ] = 

'cos 0 2 

— sin 0 2 ' 

_sin 0i 

cos 0i _ 

sin 0 2 

COS 02 _ 


cos(0i + 02 ) — sin(0i + 0 2 ) 

sin(0i + 0 2 ) cos(0i + 0 2 ) 


should satisfy (4). With the help of some basic trigonometric identities, we can confirm 
that this is so as follows: 
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Using the notation Rg for a 
rotation of R 2 about the origin 
through an angle 9, the com- 
putation in Example 2 shows 
that 

Ro\ Re 2 = Roi+o 2 
This makes sense since rotat- 
ing a vector through an angle 
9\ and then rotating the result- 
ing vector through an angle 9 2 
is the same as rotating the orig- 
inal vector through the angle 
9i + $2 ■ 


[T 2 ][T i] = 


COS 02 

sin 0 2 


— sin 0 2 
cos 0 2 


cos 0i 
sin0! 


— sin 0i 
cos 0i 


cos 02 cos 0i — sin 0 2 sin 0i — (cos 0 2 sin 0i + sin 0 2 cos 0i ) 

sin 0 2 cos 0i + cos 0 2 sin 0i — sin 0 2 sin 0i + cos 0 2 cos 0i 


cos(0i + 0 2 ) — sin(0i + 0 2 ) 

sin(0i 4- 62 ) cos(0i “t - 02 ) 


= [T 2 o 7i] 


► EXAMPLE 3 Composition of Two Reflections 

Let 7j: R 2 -> R 2 be the reflection about the y-axis, and let Tr. R 1 — > R 2 be the reflec- 
tion about the x-axis. In this case T\ o T 2 and T 2 o T\ are the same; both map every 

vector x = (x, y) into its negative — x = (— x, —y) (Figure 4.10.4): 

(Ti o T 2 )(x, y ) = 7j(x, -y) = (-x, -y) 

(T 2 o 7) ) (x , y) = Tfl-x, y) = (-x, -y) 

The equality of 7) o T 2 and T 2 oT\ can also be deduced by showing that the standard 
matrices for T\ and T 2 commute: 



'-1 O' 

'1 O' 


'-1 o' 

\Ti 0 T 2 ] = [T\][T 2 \ = 

0 1 

.0 -1 

— 

0 -1_ 


'1 O' 

'-1 O' 


'-1 O' 

[T 2 0 7i] = [T 2 ][Ti] = 

_0 -1_ 

0 1_ 

— 

0 -1_ 


The operator T (x) = — x on R 2 or A” is called the reflection about the origin. As the 
foregoing computations show, the standard matrix for this operator on R 2 is 



EXAMPLE 4 Composition of Three Transformations 

Find the standard matrix for the operator T: R 3 -» R 3 that first rotates a vector coun- 
terclockwise about the z-axis through an angle 0, then reflects the resulting vector about 
the yj-plane, and then projects that vector orthogonally onto the xy-plane. 

Solution The operator T can be expressed as the composition 

T = T 2 oT 2 oT\ 

where 7) is the rotation about the z-axis, T 2 is the reflection about the yz-plane, and T 3 
is the orthogonal projection onto the xy-plane. From Tables 6, 2, and 4 of Section 4.9, 
the standard matrices for these operators are 
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One-to-One Matrix 
Transformations 


cost? 

— sind 

o" 


"-1 

0 

o’ 


"l 

0 

o’ 

sin 9 

cos 6 

0 

, [Ti] = 

0 

1 

0 

, [Ta] = 

0 

1 

0 

0 

0 

1 


0 

0 

1 


0 

0 

0 


Thus, it follows from (5) that the standard matrix for T is 


[T] = 



"-1 

0 

o’ 



0 

1 

0 



0 

0 

1 



1 0 0 
0 1 0 
0 0 0 


■ cos 6 sin 6 0 

sin 9 cos 0 0 

0 0 0 


COS0 
sin 0 
0 


— sin0 
cos 9 
0 


0 

0 

1 


Our next objective is to establish a link between the invertibility of a matrix A and 
properties of the corresponding matrix transformation Ta ■ 


DEFINITION 1 A matrix transformation Ta'- R" -> R m is said to be one-to-one if T A 
maps distinct vectors (points) in R n into distinct vectors (points) in R m . 


(See Figure 4. 10.5.) This idea can be expressed in various ways. For example, you should 
be able to see that the following are just restatements of Definition 1 : 

Ta is one-to-one if for each vector b in the range of A there is exactly one vector x in 
R" such that T A x = b. 

2. Ta is one-to-one if the equality T A (u) = T A (v) implies that u = v. 



R" 


R m R n 


R m 


► Figure 4.10.5 


One-to-one 


Not one-to-one 


Rotation operators on R 2 are one-to-one since distinct vectors that are rotated 
through the same angle have distinct images (Figure 4. 10.6). In contrast, the orthogonal 
projection of R 2 onto the v-axis is not one-to-one because it maps distinct points on the 
same vertical line into the same point (Figure 4.10.7). 



▲ Figure 4.10.6 Distinct 
vectors u and v are rotated 
into distinct vectors T (u) 
and T (v). 


P 

Q 

t 

I M ^ 

*- 

▲ Figure 4.10.7 The 

distinct points P and 
Q are mapped into the 
same point M. 
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Kernel and Range In the discussion leading up to Theorem 4.2.5 we introduced the notion of the “kernel” 
of a matrix transformation. The following definition formalizes this idea and defines the 
companion notion of “range.” 


DEFINITION 2 If T A : R" —> R'" is a matrix transformation, then the set of all vectors 
in R n that T A maps into 0 is called the kerne! of T A and is denoted by ker(T A ). The set 
of all vectors in R'” that are images under this transformation of at least one vector 
in R" is called the range of T A and is denoted by R(T A ). 


In brief: 


ker(T A ) = null space of A 

(6) 

R(T a ) = column space of A 

(7) 


The key to solving a mathematical problem is often adopting the right point of view; 
and this is why, in linear algebra, we develop different ways of thinking about the same 
vector space. For example, if A is an m x n matrix, here are three ways of viewing the 
same subspace of R" : 

Matrix view: the null space of A 
System view: the solution space of Ax = 0 

• Transformation view: the kernel of T A 

and here are three ways of viewing the same subspace of R m : 

Matrix view: the column space of A 

System view: all b in R m for which Ax = b is consistent 

• Transformation view: the range of T A 

In the special case of a linear operator T A : R" —*■ R" , the following theorem establishes 
fundamental relationships between the invertibility of A and properties of T A . 


If A is an n x n matrix and T A :R"-^R n is the corresponding 
matrix operator, then the following statements are equivalent. 

(a) A is invertible. 

( b ) The kernel ofT A is {0}. 

(c) The range ofT A is R n . 

(d) T a is one-to-one. 


Proof We can prove this theorem by establishing the chain of implications (a) =>■ ( b ) => 
(c) =>■ ( d ) (a). We will prove the first two implications and leave the rest as exercises. 

(a) => (b) Assume that A is invertible. It follows from parts (u) and (b) of Theorem 4.8.8 
that the system Ax = 0 has only the trivial solution and hence that the null space of A 
is {0}. Formula (6) now implies that the kernel of T A is {0}. 

(f>) =>■ (c) Assume that the kernel of T A is {0}. It follows from Formula (6) that the null 
space of A is {0} and hence that A has nullity 0. This in turn implies that the rank of A 
is n and hence that the column space of A is all of R' 1 . Formula (7) now implies that the 
range of T A is R" . 
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Inverse of a One-to-One 
Matrix Operator 


y 



► EXAMPLE 5 The Rotation Operator on R 2 Is One-to-One 

As was illustrated in Figure 4. 10.6, the operator T: R 2 -> R 2 that rotates vectors through 
an angle 9 is one-to-one. In accordance with parts (a) and ( d ) of Theorem 4.10.1, show 
that the standard matrix for T is invertible. 


Solution We will show that the standard matrix for T is invertible by showing that its 
determinant is nonzero. From Table 5 of Section 4.9 the standard matrix for T is 


[T] 


cos 6 — sin 6 

sin 0 cos 6 


This matrix is invertible because 


det[7] = 


cos 6 
sin# 


— sin 0 
cos 0 


cos 2 0 + sin 2 9 — 1 ^ 0 


► EXAMPLE 6 Projection Operators Are Not One-to-One 

As illustrated in Figure 4.10.7, the operator T: R 2 -> R 2 that projects onto the x-axis in 
the xy-plane is not one-to-one. In accordance with parts (a) and (d) of Theorem 4.10.1, 
show that the standard matrix for T is not invertible. 

Solution We will show that the standard matrix for T is not invertible by showing that 
its determinant is zero. From Table 3 of Section 4.9 the standard matrix for T is 


Since det[T] = 0, the operator T is not one-to-one. 


If 7a: R" —>■ R" is a one-to-one matrix operator, then it follows from Theorem 4. 10.1 that 
A is invertible. The matrix operator 


T A -i:R n ->R n 

that corresponds to A~ l is called the inverse operator or (more simply) the inverse of T A . 
This terminology is appropriate because T A and T a -i cancel the effect of each other in 
the sense that if x is any vector in R n , then 

T a (T a - i(x)) = AA _1 x = /x = x 
T a -i(T a (x)) = A~ l Ax = lx — x 


or, equivalently, 

T a ° Ta- 1 = T aa -i = T] 

T a -i oT a = T a -\ a — Tj 

From a more geometric viewpoint, if w is the image of x under T A , then T A -\ maps w 
backinto x, since 

Ta-' (w) = T a -\ (T a (x)) = x 
This is illustrated in Figure 4.10.8 for R 2 . 


▲ Figure 4.10.8 
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Before considering examples, it will be helpful to touch on some notational matters. 
If Ta : R" — »■ R" is a one-to-one matrix operator, and if T A - 1 : R" — > R" is its inverse, then 
the standard matrices for these operators are related by the equation 

T a -> = 77 1 ( 8 ) 

In cases where it is preferable not to assign a name to the matrix, we can express this 
equation as 

[ T ~ l ] = [7T 1 (9) 


► EXAMPLE 7 Standard Matrix for T- 1 

Let T: R 2 ^- R 2 be the operator that rotates each vector in R 2 through the angle 9, so 
from Table 5 of Section 4.9, 


m = 


cos 9 — sin 9 

sin 9 cos 9 


(10) 


It is evident geometrically that to undo the effect of T , one must rotate each vector in R 2 
through the angle — 6 . But this is exactly what the operator T~ x does, since the standard 


n— 1 



cos 9 

sin 9 " 


"cos( — 9) 

— sin(— 9) 


— sin 9 

COS0_ 


_sin(— 9) 

cos (—9)_ 


[7’- 1 ] = [7T 1 = 

(verify), which is the standard matrix for a rotation through the angle —9. 


► EXAMPLE 8 Finding T 1 

Show that the operator T: R 2 -> R 2 defined by the equations 

u>i = 2xi + X 2 

w 2 = 3x l + 4x 2 

is one-to-one, and find W 2 ). 

Solution The matrix form of these equations is 


U)\ 


'2 

r 

'x{ 

W 2 


_3 

4_ 

x 2 _ 


so the standard matrix for T is 

F2 r 

.3 4_ 

This matrix is invertible (so T is one-to-one) and the standard matrix for T~ x is 


m = 


IT- 1 ] = [7T 1 = 


Thus 


[r -i] 




4 1 



4 1 

U)\ 


5 5 


u> 1 


5®1 — 5 W 2 

_ w 2_ 


3 2 

5 5 _ 


W 2 


— fw! + \u)2 


from which we conclude that 

T~ l (w 1 , w 2 ) = (fw 1 - lio 2 , -fwi + \w 2 ) 
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More on the Equivalence As our final result in this section, we will add parts ( b ), (c), and (d) of Theorem 4.10.1 
Theorem to Theorem 4.8.8. 


Equivalent Statements 


If A 

is an n x n matrix, then the following statements are 

equivalent. 

(a) 

A is invertible. 


(b) 

Ax = 0 has only the trivial solution. 


(c) 

The reduced row echelon form of A is I„. 


(d) 

A is expressible as a product of elementary matrices. 


(e) 

Ax = b is consistent for every n x 1 matrix b. 


if) 

Ax = b has exactly one solution for every n x 1 matrix b. 

(g) 

det(A) ^ 0. 


(h) 

The column vectors of A are linearly independent. 


O') 

The row vectors of A are linearly independent. 


U) 

The column vectors of A span R n . 


(k) 

The row vectors of A span R n . 


(0 

The column vectors of A form a basis for R" . 


(m) 

The row vectors of A form a basis for R n . 


(n) 

A has rank n. 


(o) 

A has nullity 0. 


( P ) 

The orthogonal complement of the null space of A is R". 

(<?) 

The orthogonal complement of the row space of A is 

{0}. 

(r) 

The kernel ofT A is {0}. 


0) 

The range ofT A is R n . 


(0 

T A is one-to-one. 



Exercise Set 4.10 

In Exercises 1-4, determine whether the operators 7j and T 2 
commute; that is, whether T\ o T 2 — T 2 o T\ . 

1. (a) 7j : 6? 2 — >■ R 2 is the reflection about the line y — x, and 

T 2 : R 2 —* R 2 is the orthogonal projection onto the x-axis. 

(b) 7j : R 2 —* R 2 is the reflection about the x-axis, and 
T 2 : R 2 —* R 2 is the reflection about the line y — x. 

2. (a) 7j: R 2 — > R 2 is the orthogonal projection onto the x-axis, 

and T 2 :R 2 -)-R 2 is the orthogonal projection onto the 
y-axis. 

(b) 7j : R 2 —* R 2 is the rotation about the origin through an 
angle of n / 4, and T 2 : R 2 — > R 2 is the reflection about the 
y-axis. 

3. 7j : R 3 — > R 3 is a dilation with factor k, and T 2 : R 3 — >■ R 3 is a 
contraction with factor 1 / k. 

4. 7j : R 3 — > R 3 is the rotation about the x-axis through an angle 
6*i, and T 2 : R 3 ^-R 3 is the rotation about the z-axis through 
an angle 9 2 . 


In Exercises 5-6, let T A and T B bet the operators whose stan- 
dard matrices are given. Find the standard matrices for T B o T A 
and T a oT b . 



6. A = 

'6 3 -l" 

2 0 1 

, B = 

' 4 0 4" 

-1 5 2 


4-3 6 


2-3 8 


7. Find the standard matrix for the stated composition in R 2 . 

(a) A rotation of 90°, followed by a reflection about the line 
V = x. 

(b) An orthogonal projection onto the y-axis, followed by a 
contraction with factor k = 

(c) A reflection about the x-axis, followed by a dilation with 
factor k = 3, followed by a rotation about the origin 
of 60°. 
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8. Find the standard matrix for the stated composition in R 2 . 

(a) A rotation about the origin of 60°, followed by an orthog- 
onal projection onto the x-axis, followed by a reflection 
about the line y — x. 

(b) A dilation with factor k = 2, followed by a rotation about 
the origin of 45°, followed by a reflection about the v-axis. 

(c) A rotation about the origin of 15°, followed by a rotation 
about the origin of 105°, followed by a rotation about the 
origin of 60°. 

9. Find the standard matrix for the stated composition in R 3 . 

(a) A reflection about the yz-plane, followed by an orthogonal 
projection onto the xz-plane. 

(b) A rotation of 45° about the v-axis, followed by a dilation 
with factor k = \fl. 

(c) An orthogonal projection onto the xy-plane, followed by 
a reflection about the yz-plane. 

10. Find the standard matrix for the stated composition in R? . 

(a) A rotation of 30° about the x-axis, followed by a rotation 
of 30° about the z-axis, followed by a contraction with 
factor k = J . 

(b) A reflection about the xy-plane, followed by a reflection 
about the xz-plane, followed by an orthogonal projection 
onto the yz-plane. 

(c) A rotation of 270° about the x-axis, followed by a rota- 
tion of 90° about the y-axis, followed by a rotation of 180° 
about the z-axis. 


11. Let 7j (xi , x 2 ) = (xi + x 2 , xi — x 2 ) and 
r 2 (xi,x 2 ) = (3xi, 2xi + 4x 2 ). 

(a) Find the standard matrices for 7j and T 2 . 

(b) Find the standard matrices for T 2 o 7j and 7j o T 2 . 

(c) Use the matrices obtained in part (b) to find formulas for 
Ti(T 2 (x u x 2 )) and T 2 (Ti(x u x 2 )). 

12. Let 7j (xi , x 2 , X3) = (4xi, — 2xi + x 2 , — xy — 3x 2 ) and 
7 2 (xi, x 2 , x 3 ) = (xi + 2 x 2 , — x 3 , 4xi - x 3 ). 

(a) Find the standard matrices for 7j and T 2 . 

(b) Find the standard matrices for T 2 o 7j and 7\ o T 2 . 

(c) Use the matrices obtained in part (b) to find formulas for 
7j(r 2 (xi,x 2 ,X3)) and 7,(7) (xi, x 2 , x 3 )). 

In Exercises 3-D , determine by inspection whether the stated 
matrix operator is one-to-one. 

13. (a) The orthogonal projection onto the x-axis in R 2 . 

(b) The reflection about the y-axis in R 2 . 

(c) The reflection about the line y = x in R 2 . 

(d) A contraction with factor k > 0 in R 2 . 


14. (a) A rotation about the z-axis in R 3 . 

(b) A reflection about the xy-plane in R 3 . 

(c) A dilation with factor k > 0 in R 3 . 

(d) An orthogonal projection onto the xz-plane in R 3 . 

In Exercises 15-16, describe in words the inverse of the given 
one-to-one operator. 

15. (a) The reflection about the x-axis on R 2 . 

(b) The rotation about the origin through an angle of jr/4 
on R 2 . 

(c) The dilation with factor of 3 on R 2 . 

16. (a) The reflection about the yz-plane in R 3 . 

(b) The contraction with factor | in R 3 . 

(c) The rotation through an angle of —18° about the z-axis 
in R 3 . 

In Exercises 17- 18, express the equations in matrix form, and 
then use parts (g) and ( s ) of Theorem 4.10.2 to determine whether 
the operator defined by the equations is one-to-one. 

17. (a) uy = 8 x 1 + 4x 2 (b) uy = — xi + 3x 2 + 2x 3 

w 2 = 2x 1 T x 2 w 2 = 2x ] T 4x 3 

w 3 = X\ + 3 x 2 + 6x3 


18. (a) uy = 2xy — 3x 2 (b) uy = X\ + 2x 2 + 3x 3 

w 2 = 5xi + x 2 w 2 = 2xi + 5x 2 + 3x3 

u> 3 = Xi + 8X3 

19. Determine whether the matrix operator T: R 2 -* R 2 defined 
by the equations is one-to-one; if so, find the standard matrix 
for the inverse operator, and find T~ l (u>i, w 2 ). 

(a) uy = xi + 2x 2 (b) uy = 4xi — 6x 2 

w 2 = — xi + x 2 w 2 = — 2xi + 3 x 2 

20. Determine whether the matrix operator T: R 3 — >• R 3 defined 
by the equations is one-to-one; if so, find the standard matrix 
for the inverse operator, and find T~ l (w\, w 2 , uy). 

(a) w 1 = xi — 2x 2 + 2x 3 (b) uy = xi — 3x 2 + 4x 3 

w 2 = 2xy + x 2 + x 3 w 2 = — xi + x 2 + x 3 

uy = Xi + x 2 uy = — 2x 2 + 5x 3 


In Exercises 21-22, determine whether multiplication by A is 
a one-to-one matrix transformation. 


21. (a) A = 

'l 

2 

-f 

0 

(b) A = 

1 

-1 

2 3" 

0 -4 


3 

-4 



J 


22. (a) A = 


1 2 1 ' 

0 1 1 

1 1 0 

1 0 -1 


(b) A = 


4 

1 


3 

1 
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In Exercises 23-24, let T be multiplication by the matrix A. 
Find 

(a) a basis for the range of T. 

(b) a basis for the kernel of T. 

(c) the rank and nullity of T. 

(d) the rank and nullity of A. 



"l 

-1 

3' 


" 2 

0 

-l" 

u> 

II 

5 

6 

-4 

■t- 

II 

4 

0 

-2 


7 

4 

2 


20 

0 

0 


32. (a) The inverse transformation for a reflections about a coor- 

dinate axis is a reflection about that axis. 

(b) The inverse transformation for a shear along a coordinate 
axis is a shear along that axis. 

Working with Proofs 

33. Prove that the matrix transformations T A and T B commute if 
and only if the matrices A and B commute. 

34. Prove the implication (c) => (d) in Theorem 4.10.1. 


In Exercises 25-26, let T A : /? 4 — > R 3 be multiplication by A. 
Find a basis for the kernel of T A , and then find a basis for the 
range of T A that consists of column vectors of A. 


25. A = 


1 2 
-3 1 

-3 8 


-1 -2 

3 4 

4 2 


26. A = 


1 

-2 

-1 


1 

4 

8 


0 

2 

3 


1 

2 

5 


27. Let A be an n x n matrix such that det(A) = 0, and let 
T: R" —)■ R " be multiplication by A. 

(a) What can you say about the range of the matrix operator 
T1 Give an example that illustrates your conclusion. 

(b) What can you say about the number of vectors that T maps 
into 0 ? 


35. Prove the implication (d) =>• (a) in Theorem 4.10.1. 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 
false, and justify your answer. 

(a) If T a and Tg are matrix operators on R'\ then 
T a (T b (x)) = T b (T a (x)) for every vector x in R n . 

(b) If 7j and T 2 are matrix operators on R'\ then 
\T 2 O 7j] = [r 2 ][7j]. 

(c) A composition of two rotation operators about the origin of 
R 2 is another rotation about the origin. 

(d) A composition of two reflection operators in R 2 is another 
reflection operator. 


28. Answer the questions in Exercise 27 in the case where 
det(A) / 0. 

29. (a) Is a composition of one-to-one matrix transformations 

one-to-one? Justify your conclusion. 

(b) Can the composition of a one-to-one matrix transforma- 
tion and a matrix transformation that is not one-to-one 
be one-to-one? Account for both possible orders of com- 
position and justify your conclusion. 

30. Let T a : R 2 —> R 2 be multiplication by 


(e) The kernel of a matrix transformation T A : R" — >• R'" is the 
same as the null space of A. 

(f ) If there is a nonzero vector in the kernel of the matrix operator 
T a : R n -*■ R n , then this operator is not one-to-one. 

(g) If A is an n x n matrix and if the linear system Ax = 0 has 
a nontrivial solution, then the range of the matrix operator is 
not R”. 


A = 


cos 2 9 — sin 2 9 
2 sin 9 cos 9 


—2 sin 9 cos 9 
cos 2 9 — sin 2 9 


(a) What is the geometric effect of applying this transforma- 
tion to a vector x in R 2 ? 

(b) Express the operator T A as a composition of two linear 
operators on R 2 . 

In Exercises 31-32, use matrix inversion to confirm the stated 
result in R 2 . 

31. (a) The inverse transformation for a reflection about y = x is 
a reflection about y = x. 

(b) The inverse transformation for a compression along an 
axis is an expansion along that axis. 


Working with Technology 

Tl. (a) Find the standard matrix for the linear operator on R 3 
that performs a counterclockwise rotation of 47° about 
the x-axis, followed by a counterclockwise rotation of 68° 
about the y-axis, followed by a counterclockwise rotation 
of 33° about the z-axis. 

(b) Find the image of the point (1, 1, 1) under the operator 
in part (a). 

T2. Find the standard matrix for the linear operator on R 2 that 
first reflects each point in the plane about the line through the ori- 
gin that makes an angle of 21° with the positive x-axis and then 
projects the resulting point orthogonally onto the line through the 
origin that makes an angle of 51° with the positive x-axis. 
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4.11 Geometry of Matrix Operators on R 2 

In applications such as computer graphics it is important to understand not only how linear 
operators on R 2 and R 3 affect individual vectors but also how they affect two-dimensional 
or three-dimensional regions. That is the focus of this section. 

Transformations of Regions Figure 4.11.1 shows a famous picture of Albert Einstein that has been transformed in 

various ways using matrix operators on R 2 . The original image was scanned and then 
digitized to decompose it into a rectangular array of pixels. Those pixels were then 
transformed as follows: 

The program MATLAB was used to assign coordinates and a gray level to each pixel. 
The coordinates of the pixels were transformed by matrix multiplication. 

The pixels were then assigned their original gray levels to produce the transformed 
picture. 

In computer games a perception of motion is created by using matrices to rapidly 
and repeatedly transform the arrays of pixels that form the visual images. 






Digitized scan 


Rotated 


Sheared horizontally 


Compressed horizontally 


[Image: ARTHUR SASSE! AFP! Getty Images ) 


Images of Lines Under The effect of a matrix operator on R 2 can often be deduced by studying how it transforms 
Matrix Operators the points that form the unit square. The following theorem, which we state without 
proof, shows that if the operator is invertible, then it maps each line segment in the unit 
square into the line segment connecting the images of its endpoints. In particular, the 
edges of the unit square get mapped into edges of the image (see Figure 4.1 1.2 in which 
the edges of a unit square and the corresponding edges of its image have been numbered). 




X 


Unit square rotated 


Unit square reflected 
about the y-axis 


Unit square reflected 
about the line y = x 




▲ Figure 4.11.2 
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If T : R 2 — >• R 2 is multiplication by an invertible matrix, then'. 

(a) The image of a straight line is a straight line. 

(b) The image of a line through the origin is a line through the origin. 

(c) The images of parallel lines are parallel lines. 

(d) The image of the line segment joining points P and Q is the line segment joining 
the images of P and Q. 

( e ) The images of three points lie on a line if and only if the points themselves lie on a 
line. 


EXAMPLE 1 Image of a Line 

According to Theorem 4.11.1, the invertible matrix 

'3 1' 

A ~ 2 1 . 

maps the line y — 2x + 1 into another line. Find its equation. 


Solution Let (x, y) be a point on the line y = 2x + 1, and let (x', y') be its image under 
multiplication by A. Then 


V 


'3 r 

~ x~ 

and 

" x~ 


'3 r 

— i 

V 


' 1 -1" 

V 

y. 


2 1_ 

_y_ 


_y_ 


.2 i. 




-2 3_ 

y. 


x = x' — y' 
y = —2x' + 3/ 

Substituting these expressions in y = 2x + 1 yields 


or, equivalently, 


—2x + 3/ = 2(x! - /) + 1 


/ = U' + \ 



► EXAMPLE 2 Transformation of the Unit Square 

Sketch the image of the unit square under multiplication by the invertible matrix 



Label the vertices of the image with their coordinates, and number the edges of the unit 
square and their corresponding images (as in Figure 4.1 1.2). 


Solution Since 


0 

1 

0 


0 


0 

1 

1 


0 

2 

1 

0 


0 

5 

2 

1 

0 


2 

"o 

l" 

"o’ 


V 


"o 

l" 

V 


T 

2 

1 

1 


1 

’ 

2 

1 

l 


3 


the image of the unit square is a parallelogram with vertices (0, 0), (0, 2), (1, 1), and 
(1.3) (Figure 4.1 1.3). 


The next example illustrates a transformation of the unit square under a composition 
of matrix operators. 
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EXAMPLE 3 Transformation of the Unit Square 

(a) Find the standard matrix for the operator on R 2 that first shears by a factor of 2 in 
the x -direction and then reflects the result about the line y = x. Sketch the image 
of the unit square under this operator. 

(b) Find the standard matrix for the operator on R 2 that first reflects about y = x and 
then shears by a factor of 2 in the x-direction. Sketch the image of the unit square 
under this operator. 

(c) Confirm that the shear and the reflection in parts (a) and (b) do not commute. 


Solution (a) The standard matrix for the shear is 


Ai = 


T 

0 


2 ' 

1 


and for the reflection is 


Thus, the standard matrix for the shear followed by the reflection is 


'0 

f 

'1 

2 ' 


'0 

r 

_1 

0 _ 

_0 

1 


1 

2 _ 


Solution [b) The standard matrix for the reflection followed by the shear is 


'1 

2' 

'0 

r 


'2 

r 

_0 

1_ 

1 

0_ 


1 

o_ 


Solution (c) The computations in Solutions ( a ) and ( b ) show that A 1 A 2 ^ A 2 A 1 , so 
the standard matrices, and hence the operators, do not commute. The same conclusion 
follows from Figures 4.1 1.4 and 4.1 1.5 since the two operators produce different images 
of the unit square. 


► Figure 4.11.4 





► Figure 4.11.5 
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Geometry of Invertible In Example 3 we illustrated the effect on the unit square in R 2 of a composition of shears 
Matrix Operators and reflections. Our next objective is to show how to decompose any 2x2 invertible 
matrix into a product of matrices in Table 1, thereby allowing us to analyze the geometric 
effect of a matrix operator in R 2 as a composition of simpler matrix operators. The next 
theorem is our first step in this direction. 



(Continued on the following page.) 
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.11.2 IfE is an elementary matrtix, then Te' R 2 — »■ R 2 is one of the 

following : 

(a) A shear along a coordinate axis. 

(b) A reflection about y = x. 

(c) A compression along a coordinate axis. 

(d) An expansion along a coordinate axis. 

( e ) A reflection about a coordinate axis. 

(f) A compression or expansion along a coordinate axis followed by a reflection about 
a coordinate axis. 
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Proof Because a 2 x 2 elementary matrix results from performing a single elementary 
row operation on the 2x2 identity matrix, such a matrix must have one of the following 
forms (verify): 


'1 O' 


'1 k' 


'o r 


~k O' 


'1 O' 

i 

’ 

O 

l 

’ 

i 

0 

1 

5 

O 

1 

’ 

0 k 


The first two matrices represent shears along coordinate axes, and the third represents 
a reflection about y = x. If k > 0, the last two matrices represent compressions or 
expansions along coordinate axes, depending on whether 0 < k < 1 or k > 1 . If k < 0, 
and if we express k in the form k = —k\ , where k\ > 0, then the last two matrices can be 
written as 


~k 

O' 


~-k\ 

o' 


'-1 

0 " 

~k\ 

O' 

_0 

1 


0 

1 


0 

1 _ 

_0 

1 _ 

'1 

O' 


'1 

0 ' 


'1 

o' 

"1 

0 ' 

_0 

k 


_0 

~k\_ 


_0 

- 1 _ 

_0 

ki_ 


Since k\ > 0, the product in (1) represents a compression or expansion along the 
x-axis followed by a reflection about the y-axis, and (2) represents a compression or 
expansion along the y-axis followed by a reflection about the x-axis. In the case where 
k = — 1, transformations (1) and (2) are simply reflections about the y-axis and x-axis, 
respectively. 


We know from Theorem 4.10.2 (d) that an invertible matrix can be expressed as a 
product of elementary matrices, so Theorem 4.11.2 implies the following result. 


i IfT A :R 2 ^R 2 is multiplication by an invertible matrix A, then the 
geometric effect of T A is the same as an appropriate succession of shears, compressions, 
expansions, and reflections. 


The next example will illustrate how Theorems 4.11.2 and 4.11.3 together with 
Table 1 can be used to analyze the geometric effect of multiplication by a 2 x 2 invertible 
matrix. 


► EXAMPLE 4 Decomposing a Matrix Operator 

In Example 2 we illustrated the effect on the unit square of multiplication by 

'0 r 
A ~ .2 1. 

(see Figure 4.1 1.3). Express this matrix as a product of elementary matrices, and then 
describe the effect of multiplication by A in terms of shears, compressions, expansions, 
and reflections. 

Solution The matrix A can be reduced to the identity matrix as follows: 

1 O' 

0 1 


Interchange the Multiply the Add — ! times 

first and second first row by ) . the second row 

to the first. 


0 1 


2 1 


1 1 


2 1 


0 1 


0 I 



rows. 
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( 1 , 1 ) 


A Figure 4.11.6 


y 


( 1 , 1 ) 



(- 1 ,- 1 ) 

▲ Figure 4.11.7 


These three successive row operations can be performed by multiplying A on the left 
successively by 


"o l" 


o’ 


ri -n 


, e 2 = 

2 

, E 3 = 

2 

1 0 


0 1 

0 1 


Inverting these matrices and using Formula (4) of Section 1.5 yields 


"o 

l" 

= E7 ] E7 [ E7 l = 

"o 

l" 

’2 

0 ’ 

"1 f 

2 

I 

1 Li 

1 

0 

0 

1 

0 1 


Reading from right to left we can now see that the geometric effect of multiplying by A 
is equivalent to successively 

1. shearing by a factor of ^ in the x-direction; 

2. expanding by a factor of 2 in the x-direction; 

3. reflecting about the line y — x. 


This is illustrated in Figure 4.1 1.6, whose end result agrees with that in Example 2. 



► EXAMPLE 5 Transformations with Diagonal Matrices 

Discuss the geometric effect on the unit square of multiplication by a diagonal matrix 



in which the entries k\ and k 2 are positive real numbers 1). 
Solution The matrix A is invertible and can be expressed as 


~ki 

0" 


T 

0" 

~k\ 

O' 

_0 

ki_ 


_0 

& 2 _ 

_0 

1 _ 


which show that multiplication by A causes a compression or expansion of the unit 
square by a factor of k\ in the x-direction followed by an expansion or compression of 
the unit square by a factor of k 2 in the y-direction. 


► EXAMPLE 6 Reflection About the Origin 

As illustrated in Figure 4.1 1.7, multiplication by the matrix 


A = 


-1 

0 
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has the geometric effect of reflecting the unit square about the origin. Note, however, 
that the matrix equation 

1 O' 

o -i. 

together with Table 1 shows that the same result can be obtained by first reflecting the 
unit square about the x-axis and then reflecting that result about the y-axis. You should 
be able to see this as well from Figure 4.1 1.7. 


-1 

0 


-1 

0 

0 

-1 


0 

1 


EXAMPLE 7 Reflection About the Line / = -x 

We leave it for you to verify that multiplication by the matrix 

" 0 - 1 ' 

A = [-1 0. 

reflects the unit square about the line y = —x (Figure 4.1 1.8). M 


Exercise Set 4.11 


1. Use the method of Example 1 to find an equation for the image 
of the line y = 4x under multiplication by the matrix 


8. (a) Reflects about the y-axis, then expands by a factor of 5 in 
the x-direction, and then reflects about y = x. 


A = 


5 

2 


2 

1 


(b) Rotates through 30° about the origin, then shears by a fac- 
tor of —2 in the y-direction, and then expands by a factor 
of 3 in the y-direction. 


2. Use the method of Example 1 to find an equation for the image 
of the line y = —Ax + 3 under multiplication by the matrix 



In Exercises 3-4, find an equation for the image of the line 
y = 2x that results from the stated transformation. 

3. A shear by a factor 3 in the x -direction. 

4. A compression with factor ) in the y-direction. 

In Exercises >-6, sketch the image of the unit square under 
multiplication by the given invertible matrix. As in Example 2, 
number the edges of the unit square and its image so it is clear 
how those edges correspond. 


In each part of Exercises 9-10, determine whether the stated 
operators commute. 

9. (a) A reflection about the x-axis and a compression in the 
x -direction with factor 

(b) A reflection about the line y = x and an expansion in the 
x-direction with factor 2. 

10. (a) A shear in the y-direction by a factor | and a shear in the 
y-direction by a factor |. 

(b) A shear in the y-direction by a factor | and a shear in the 
x-direction by a factor |. 

In Exercises -14, express the matrix as a product of elemen- 
tary matrices, and then describe the effect of multiplication by A 
in terms of shears, compressions, expansions, and reflections. 


5. 


'3 -F 

6. 

' 2 r 


f4 

47 


i — 
-T- 

i 

1 

<N 

1 

1 

i 

1 

KJ 

1 

II 

1 

o 

1 

<N 

1 

II 

c4 

2 9_ 



In each part of Exercises -8, find the standard matrix for a 
single operator that performs the stated succession of opera- 13. A = 
tions. 


14. A = 


-3' 

6 


7. (a) Compresses by a factor of \ in the x-direction, then ex- 
pands by a factor of 5 in the y-direction. 

(b) Expands by a factor of 5 in the y-direction, then shears by 
a factor of 2 in the y-direction. 

(c) Reflects about y = x, then rotates through an angle of 
180° about the origin. 


In each part of Exercises 15-16, describe, in words, the effect on 
the unit square of multiplication by the given diagonal matrix. 


15. (a) A = 


16. (a) A = 


-2 O' 
0 1 


(b) A = 


(b) A = 


1 O' 

0 — 5_ 

-3 O' 
0 -1 
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17. (a) Show that multiplication by 


maps each point in the plane onto the line y = 2x. 

(b) It follows from part (a) that the noncollinear points (1,0), 
(0, 1), (—1, 0) are mapped onto a line. Does this violate 
part ( e ) of Theorem 4.1 1.1? 

18. Find the matrix for a shear in the x-direction that transforms 
the triangle with vertices (0, 0), (2. 1), and (3, 0) into a right 
triangle with the right angle at the origin. 

19. In accordance with part (c) of Theorem 4.11.1, show that 
multiplication by the invertible matrix 


maps the parallel lines y = 3x + 1 and y = 3x — 2 into par- 
allel lines. 

20. Draw a figure that shows the image of the triangle with vertices 
(0, 0), (1 , 0), and (0.5, 1) under a shear by a factor of 2 in the 
x-direction. 

21. (a) Draw a figure that shows the image of the triangle with 

vertices (0, 0), (1, 0), and (0.5, 1) under multiplication by 



25. Find the image of the triangle with vertices (0, 0), (1, 1), (2, 0) 
under multiplication by 



Does your answer violate part (e) of Theorem 4.11.1? Explain. 

26. In R 3 the shear in the xy-direction by a factor k is the matrix 
transformation that moves each point (x, y, z) parallel to the 
xy-plane to the new position (x + kz, y + kz, z). (See the 
accompanying figure.) 

(a) Find the standard matrix for the shear in the x v-direction 
by a factor k. 

(b) How would you define the shear in the xz-direction by a 
factor k and the shear in the yz-direction by a factor kl 
What are the standard matrices for these matrix transfor- 
mations? 



Working with Proofs 


(b) Find a succession of shears, compressions, expansions, 
and reflections that produces the same image. 

22. Find the endpoints of the line segment that results when the 
line segment from P(l, 2) to <2(3, 4) is transformed by 

(a) a compression with factor | in the y-direction. 

(b) a rotation of 30° about the origin. 

23. Draw a figure showing the italicized letter “7”’ that results 
when the letter in the accompanying figure is sheared by a 
factor f in the x -direction. 


y 


1 

h 

■J 





(0, .90) 






- 

X 

1 


— L 

t \ 1 


(.45,0) (.55, 0) 


◄ Figure Ex-23 


27. Prove part (a) of Theorem 4.11.1. [Hint: A line in the plane 
has an equation of the form Ax + By + C = 0, where A and 
B are not both zero. Use the method of Example 1 to show 
that the image of this line under multiplication by the invertible 
matrix 

a b 
c d 

has the equation A'x + B'y + C = 0, where 
A' = (dA — cB)/(ad — be) 

and 

B' = (~bA+aB)/(ad - be) 

Then show that A' and B' are not both zero to conclude that 
the image is a line.] 

28. Use the hint in Exercise 27 to prove parts ( b ) and (c) of Theo- 
rem 4.11.1. 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 
false, and justify your answer. 

(a) The image of the unit square under a one-to-one matrix oper- 
ator is a square. 


(b) A 2 x 2 invertible matrix operator has the geometric effect of 

24. Can an invertible matrix operator on R 2 map a square region a succession of shears, compressions, expansions, and reflec- 

into a triangular region? Justify your answer. tions. 
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(c) The image of a line under an invertible matrix operator is a 
line. 

(d) Every reflection operator on R 2 is its own inverse. 


(e) The matrix 


1 

1 



represents reflection about a line. 


(f ) The matrix 


T 

2 


- 2 ' 

1 


represents a shear. 


(g) The matrix 


T 

0 


O' 

3 


represents an expansion. 


Supplementary Exercises 


1. Let V be the set of all ordered triples of real numbers, and 

consider the following addition and scalar multiplication op- 
erations on u = («i, u 2 , M 3 ) and v = (iq, v 2 , v 3 ): 

u + v = (mi + i>i, u 2 + v 2 , m 3 + v 3 ), Aru = ( kui , 0 , 0 ) 

(a) Compute u + v and £u for u = (3, —2, 4), 
v = (1, 5, —2), and k = —1. 

(b) In words, explain why V is closed under addition and 
scalar multiplication. 

(c) Since the addition operation on V is the standard addition 
operation on R 3 , certain vector space axioms hold for V 
because they are known to hold for R 3 . Which axioms in 
Definition 1 of Section 4. 1 are they? 

(d) Show that Axioms 7, 8 , and 9 hold. 

(e) Show that Axiom 10 fails for the given operations. 


2. In each part, the solution space of the system is a subspace of 
R 3 and so must be a line through the origin, a plane through 
the origin, all of R 3 , or the origin only. For each system, de- 
termine which is the case. If the subspace is a plane, find an 
equation for it, and if it is a line, find parametric equations. 


(a) Ox + 0y + Oz = 0 


(c) x — 2y + 7z = 0 
— 4x + 8 y + 5z = 0 
2x — 4y + 3z = 0 


(b) 2x — 3y + z = 0 
6x — 9y + 3z = 0 
— 4x + 6y — 2z = 0 

(d) X + 4y + 8 z = 0 
2x + 5y + 6z = 0 
3x + y — 4z = 0 


3. For what values of s is the solution space of 

X\ + x 2 + sx 3 — 0 
Xi T sx 2 T X 3 = 0 
sx 1 + x 2 T X 3 = 0 


the origin only, a line through the origin, a plane through the 
origin, or all of R 3r \ 


4. (a) Express (4a, a — b, a + 2b) as a linear combination of 
(4, 1,1) and (0, -1,2). 

(b) Express (3 a + b + 3c, —a + 4b — c,2a + b + 2c) as a 
linear combination of (3 , — 1 , 2) and (1,4, 1 ) . 

(c) Express (2a — b + 4c, 3a — c, 4b + c) as a linear combi- 
nation of three nonzero vectors. 


5. Let W be the space spanned by f = sinx and g = cosx. 

(a) Show that for any value of 9, = sin(x + 9) and 

= cos(x + 9) are vectors in W. 

(b) Show that f! and gj form a basis for W. 

6. (a) Express v=(l,l)asa linear combination of 

Vi = (1, — 1), \ 2 — (3, 0), and v 3 = (2, 1) in two different 
ways. 

(b) Explain why this does not violate Theorem 4.4. 1 . 


7. Let A be an n x n matrix, and let Vi, V 2 , . . . , v„ be linearly 
independent vectors in R n expressed as n x 1 matrices. What 
must be true about A for Avi . AV 2 , . . . , Av„ to be linearly in- 
dependent? 

8 . Must a basis for P n contain a polynomial of degree k for each 
k = 0, 1, 2, . . . , m? Justify your answer. 

9. For the purpose of this exercise, let us define a “checkerboard 
matrix” to be a square matrix A = [ 0 , 7 ] such that 

{ 1 if i + j is even 
0 if i + j is odd 

Find the rank and nullity of the following checkerboard 
matrices. 

(a) The 3x3 checkerboard matrix. 

(b) The 4x4 checkerboard matrix. 

(c) The n x n checkerboard matrix. 


10. For the purpose of this exercise, let us define an “X-matrix” to 
be a square matrix with an odd number of rows and columns 
that has 0 's everywhere except on the two diagonals where it 
has l’s. Find the rank and nullity of the following X-matrices. 


(a) 


1 

0 

1 


0 

1 

0 


1 

0 

1 


(b) 


0 0 0 
1 0 1 
0 1 0 
1 0 1 
0 0 0 


1 

0 

0 

0 

1 


(c) the X-matrix of size (2 n + 1) x (2 n + 1) 
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11. In each part, show that the stated set of polynomials is a sub- 
space of P n and find a basis for it. 

(a) All polynomials in P„ such that p(—x) = p(x). 

(b) All polynomials in P n such that p(0) = p( 1). 

12. ( Calculus required) Show that the set of all polynomials in P„ 
that have a horizontal tangent at x = 0 is a subspace of P„ . 
Find a basis for this subspace. 

13. (a) Find a basis for the vector space of all 3 x 3 symmetric 

matrices. 

(b) Find a basis for the vector space of all 3x3 skew- 
symmetric matrices. 


14. Various advanced texts in linear algebra prove the following 
determinant criterion for rank: The rank of a matrix A is r if 
and only if A has some r x r submatrix with a nonzero determi- 
nant, and all square submatrices of larger size have determinant 
zero. [Note: A submatrix of A is any matrix obtained by 
deleting rows or columns of A. The matrix A itself is also 
considered to be a submatrix of A.] In each part, use this 
criterion to find the rank of the matrix. 


(a) 


1 

2 


2 0 " 
4 -1 



2 

4 


3' 

6 


(c) 


1 

2 

3 


0 

-1 

-1 


1 

3 

4 



-1 

1 

2 


2 0 
0 0 
4 0 


15. Use the result in Exercise 14 above to find the possible ranks 
for matrices of the form 


0 0 0 0 0 fl 16 

0 0 0 0 0 a 26 

0 0 0 0 0 a, 6 

0 0 0 0 0 a 4 s 

051 052 053 054 O55 056 

16. Prove: If 5 is a basis for a vector space V, then for any vectors 
u and v in V and any scalar k , the following relationships hold. 

(a) (u + v)s = (u) s + (v) s (b) (k\j) s = fc(u) s 

17. Let D k , R e , and S k be a dilation of R 2 with factor k, a coun- 
terclockwise rotation about the origin of R 2 through an angle 
9, and a shear of R 2 by a factor k, respectively. 

(a) Do D k and Re commute? 

(b) Do Rg and S k commute? 

(c) Do D k and S k commute? 

18. A vector space V is said to be the direct sum of its subspaces 
U and W , written V = U (BW , if every vector in V can be 
expressed in exactly one way as v = u + w, where u is a vector 
in U and w is a vector in W. 

(a) Prove that V = U®W if and only if every vector in V is 
the sum of some vector in U and some vector in W and 

unw = ( 0 ). 

(b) Let U be the xy-plane and W the z-axis in R 2 . Is it true 
that R 2 = U®Wl Explain. 

(c) Let U be the xy-plane and W the yz-plane in R 2 . Can ev- 
ery vector in R 2 be expressed as the sum of a vector in U 
and a vector in IV? Is it true that R 2 = U®W1 Explain. 


k 
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INTRODUCTION In this chapter we will focus on classes of scalars and vectors known as “eigenvalues” 
and “eigenvectors,” terms derived from the German word eigen , meaning “own,” 
“peculiar to,” “characteristic,” or “individual.” The underlying idea first appeared in 
the study of rotational motion but was later used to classify various kinds of surfaces 
and to describe solutions of certain differential equations. In the early 1900s it was 
applied to matrices and matrix transformations, and today it has applications in such 
diverse fields as computer graphics, mechanical vibrations, heat flow, population 
dynamics, quantum mechanics, and economics, to name just a few. 


5.1 Eigenvalues and Eigenvectors 

In this section we will define the notions of “eigenvalue” and “eigenvector” and discuss 
some of their basic properties. 


Definition of Eigenvalue We begin with the main definition in this section. 

and Eigenvector 

DEFINITION 1 If A is an n x n matrix, then a nonzero vector x in R" is called an 
eigenvector of A (or of the matrix operator Ta) if Ax is a scalar multiple of x; that is, 

Ax = lx 

for some scalar k. The scalar k is called an eigenvalue of A (or of T A ), and x is said 
to be an eigenvector corresponding to k. 


The requirement that an eigen- 
vector be nonzero is imposed 
to avoid the unimportant case 
AO = TO, which holds for ev- 
ery A and k. 


In general, the image of a vector x under multiplication by a square matrix A dif- 
fers from x in both magnitude and direction. However, in the special case where x is 
an eigenvector of A, multiplication by A leaves the direction unchanged. For example, 
in R 2 or R' multiplication by A maps each eigenvector x of A (if any) along the same 
line through the origin as x. Depending on the sign and magnitude of the eigenvalue k 
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corresponding to x, the operation Ax = Xx compresses or stretches x by a factor of X, 
with a reversal of direction in the case where X is negative (Figure 5.1.1). 




► EXAMPLE 

The vector x = 


1 

T 

2 


Eigenvector of a 2 x 2 Matrix 

is an eigenvector of 



corresponding to the eigenvalue X — 3, since 


'3 

O' 

T 


'3' 

_8 

-1 

_2_ 


_6_ 


Geometrically, multiplication by A has stretched the vector x by a factor of 3 (Figure 
5.1.2). ◄ 


Computing Eigenvalues Our next objective is to obtain a general procedure for finding eigenvalues and eigenvec- 
and Eigenvectors tors of an n x n matrix A. We will begin with the problem of finding the eigenvalues of A. 

Note first that the equation Ax = Xx can be rewritten as Ax = XIx, or equivalently, as 

(XI - A)x = 0 


Note that if (A),j = a,j, then 
formula (1) can be written in 
expanded form as 


X — flu fli2 ■ ■ ' — Ql„ 
—a 2 i X — a 22 ■■■ —a 2 „ 


C-n I Q n2 ' X Cl nn 

= 0 


For X to be an eigenvalue of A this equation must have a nonzero solution for x. But 
it follows from parts (b) and (g) of Theorem 4.10.2 that this is so if and only if the 
coefficient matrix XI — A has a zero determinant. Thus, we have the following result. 


i.l.l If A is an n x n matrix, then X is an eigenvalue of A if and only if it 
satisfies the equation 

det(A7 - A) = 0 (1) 

This is called the characteristic equation of A. 


► EXAMPLE 2 Finding Eigenvalues 

In Example 1 we observed that X = 3 is an eigenvalue of the matrix 



but we did not explain how we found it. Use the characteristic equation to find all 
eigenvalues of this matrix. 
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Solution It follows from Formula (1) that the eigenvalues of A are the solutions of the 
equation det(17 — A) = 0, which we can write as 


1-3 0 

— 8 1 + 1 


from which we obtain 


(1 — 3)(1 + 1) = 0 


( 2 ) 


This shows that the eigenvalues of A are 1 = 3 and 1 = — 1 . Thus, in addition to 
the eigenvalue 1 = 3 noted in Example 1, we have discovered a second eigenvalue 
1 = - 1 . ◄ 


When the determinant det(17 — A) in (1) is expanded, the characteristic equation 
of A takes the form 

1" + cil” 1 + ■ ■ • + c n = 0 (3) 

where the left side of this equation is a polynomial of degree n in which the coefficient 
of 1" is 1 (Exercise 37). The polynomial 

/?(!) = 1" + Cl l" 1 + • ■ ■ + c n (4) 

is called the characteristic polynomial of A. For example, it follows from (2) that the 
characteristic polynomial of the 2x2 matrix in Example 2 is 

p(X) = (1 - 3)(1 + 1) = l 2 - 21 - 3 

which is a polynomial of degree 2. 

Since a polynomial of degree n has at most n distinct roots, it follows from (3) that 
the characteristic equation of an n x n matrix A has at most n distinct solutions and 
consequently the matrix has at most n distinct eigenvalues. Since some of these solutions 
may be complex numbers, it is possible for a matrix to have complex eigenvalues, even if 
that matrix itself has real entries. We will discuss this issue in more detail later, but for 
now we will focus on examples in which the eigenvalues are real numbers. 


► EXAMPLE 3 Eigenvalues of a 3 x 3 Matrix 


Find the eigenvalues of 



1 

0 

-17 


0 

1 

8 


Solution The characteristic polynomial of A is 


det(7.7 — A) = det 


1 

0 

-4 


-1 0 
X -1 
17 X - 8 


— X 3 — SX 2 + 11X-4 


The eigenvalues of A must therefore satisfy the cubic equation 


X 3 - 8A 2 + nx - 4 = 0 


(5) 
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To solve this equation, we will begin by searching for integer solutions. This task can be 
simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial 
equation with integer coefficients 

A" + C\X" + ■ ■ • + c n = 0 

must be divisors of the constant term, c n . Thus, the only possible integer solutions of (5) 
are the divisors of —4, that is, ±1, ±2, ±4. Successively substituting these values in (5) 
shows that A — 4 is an integer solution and hence that it — 4 is a factor of the left side 
of (5). Dividing k — 4 into k 3 — 8 k 2 + Ilk — 4 shows that (5) can be rewritten as 

In applications involving large 
matrices it is often not feasi- 
ble to compute the character- 
istic equation directly, so other 
methods must be used to find 
eigenvalues. We will consider 
such methods in Chapter 9. 


(k - 4 )(k 2 - 4 k + 1 ) = 0 

Thus, the remaining solutions of (5) satisfy the quadratic equation 

k 2 - 4 k + 1 = 0 

which can be solved by the quadratic formula. Thus, the eigenvalues of A are 
k = 4, A = 2+73, and A = 2 - V3 


► EXAMPLE 4 Eigenvalues of an UpperTriangular Matrix 

Find the eigenvalues of the upper triangular matrix 



All 

<*12 

£*13 

£214 

0 

a 22 

<*23 

£2 24 

0 

0 

fl 33 

£2 34 

0 

0 

0 

U 44 


Solution Recalling that the determinant of a triangular matrix is the product of the 
entries on the main diagonal (Theorem 2.1.2), we obtain 


det(A7 



A — Q\\ 

— fll 2 

— £*13 

-a 14 

A) = det 

0 

A — £222 

— 2*23 

—£ 2 2 4 

0 

0 

A — £233 

—£234 


0 

0 

0 

A — £244 

= a- 

«n)(A — 

<*22) (k — 

£233) (A — 

£244) 


Thus, the characteristic equation is 


(A — fln)(A — 022) (k — £133) (A — £1(44) = 0 


and the eigenvalues are 


A = £2n, A = £222, k = £733, A — O44 
which are precisely the diagonal entries of A. M 


The following general theorem should be evident from the computations in the pre- 
ceding example. 


!EM 5 . 1.2 If A is ann x n triangular matrix (upper triangular, lower triangular, 
or diagonal), then the eigenvalues of A are the entries on the main diagonal of A. 
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Had Theorem 5.1.2 been avail- 
able earlier, we could have an- 
ticipated the result obtained in 
Example 2. 


Finding Eigenvectors and 
Bases for Eigenspaces 


Notice that x = 0 is in every 
eigenspace but is not an eigen- 
vector (see Definition 1). In 
the exercises we will ask you to 
show that this is the only vector 
that distinct eigenspaces have 
in common. 


► EXAMPLE 5 Eigenvalues of a LowerTriangular Matrix 

By inspection, the eigenvalues of the lower triangular matrix 


A = 



are X = y, X = |, and X = — I. <? 


0 0 



The following theorem gives some alternative ways of describing eigenvalues. 

If A is an n x n matrix, the following statements are equivalent. 

(a) X is an eigenvalue of A. 

(b) X is a solution of the characteristic equation det(k/ — A) = 0. 

(c) The system of equations (XI — A)x = 0 has nontrivial solutions. 

(i d ) There is a nonzero vector x such that Ax = Xx. 


Now that we know how to find the eigenvalues of a matrix, we will consider the 
problem of finding the corresponding eigenvectors. By definition, the eigenvectors of A 
corresponding to an eigenvalue X are the nonzero vectors that satisfy 

(XI - A)x = 0 

Thus, we can find the eigenvectors of A corresponding to X by finding the nonzero 
vectors in the solution space of this linear system. This solution space, which is called 
the eigenspace of A corresponding to X, can also be viewed as: 

1. the null space of the matrix XI — A 

2. the kernel of the matrix operator T^-a'- R" — ► R n 

3. the set of vectors for which Ax = A.x 

► EXAMPLE 6 Bases for Eigenspaces 

Find bases for the eigenspaces of the matrix 




Methods of linear algebra are used in the emerg- 
ing field of computerized face recognition. Researchers are working 
with the idea that every human face in a racial group is a combina- 
tion of a few dozen primary shapes. For example, by analyzing three- 
dimensional scans of many faces, researchers at Rockefeller University 
have produced both an average head shape in the Caucasian group- 
dubbed the meanhead (top row left in the figure to the left)— and a set 
of standardized variations from that shape, called eigenheads (15 of 
which are shown in the picture). These are so named because they are 
eigenvectors of a certain matrix that stores digitized facial information. 
Face shapes are represented mathematically as linear combinations of 
the eigenheads. 

[Image: © Dr. Joseph J. Atick, adapted from Scientific American] 
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Solution The characteristic equation of A is 
1+1 -3 


1 


= 1(1 + 1) - 6 = (1 - 2) (1 + 3) = 0 


so the eigenvalues of A are 1 = 2 and 1 = — 3. Thus, there are two eigenspaces of A, 
one for each eigenvalue. 

By definition, 

*i 

x 2 

is an eigenvector of A corresponding to an eigenvalue 1 if and only if (1/ — A)x = 0, 
that is, 


'l + 1 -3" 

Xl 


’o’ 

-2 1 

X 2 


0 


In the case where 1 = 2 this equation becomes 


3 -3’ 

Xl 


’o’ 

-2 2 

X 2 


0 


whose general solution is 

X\ = f, x 2 — t 

(verify). Since this can be written in matrix form as 


Xl 


t 

= t 

i 

X 2 


t 

i 



T 





i 




it follows that 


is a basis for the eigenspace corresponding to 1 = 2. We leave it for you to follow the 
pattern of these computations and show that 

3" 

' 2 

1 


is a basis for the eigenspace corresponding to 1 = —3. 


Figure 5.1.3 illustrates the geometric effect of multiplication by the matrix A in 
Example 6. The eigenspace corresponding to 1 = 2 is the line L\ through the origin and 
the point (1, 1), and the eigenspace corresponding to 1 = 3 is the line L 2 through the 
origin and the point (— |, 1). As indicated in the figure, multiplication by A maps each 
vector in Li back into L\, scaling it by a factor of 2, and it maps each vector in L 2 back 
into L 2 , scaling it by a factor of —3. 


► EXAMPLE 7 Eigenvectors and Bases for Eigenspaces 

Find bases for the eigenspaces of 

- 2 " 

1 
3 


"0 

A = 1 

1 


0 

2 

0 
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L 2 

\H.i) 

i i i i 1 i \ 

y 

Li 

(2,2 Y 

(1, l )/ 1 Multiplication 
_ X.' by X = 2 

x 

r 1 1 1 1 1 j 

X 


Multiplication 


by X. = -3 


► Figure 5.1.3 


Solution The characteristic equation of A is X 

3 - 5X 2 + 8X - 4 = 


form, (X — 1)(A. — 2) 2 = 0 (verify). Thus, the distinct eigenvalues of A are X — 1 and 
X = 2, so there are two eigenspaces of A. 

By definition, 

~xf 

X2 

_* 3 _ 

is an eigenvector of A corresponding to X if and only if x is a nontrivial solution of 
(XI — A)x = 0, or in matrix form. 


" 1 0 2 


~Xl~ 


"0" 

-1 X — 2 -1 


X2 

= 

0 

-1 0 X - 3_ 


_ X 3_ 


_0_ 


( 6 ) 


In the case where X = 2, Formula (6) becomes 


' 2 0 2' 


~xf 


'O' 

-1 0 -1 


x 2 

= 

0 

-1 0 -1_ 


_ X 3_ 


_ 0 _ 


Solving this system using Gaussian elimination yields (verify) 

Xi = —S, Xj — t, Xt, — s 

Thus, the eigenvectors of A corresponding to X — 2 are the nonzero vectors of the form 


Since 



—s 


—s 


0 


-l 


0 

X = 

t 

= 

0 

+ 

t 

= S 

0 

+ t 

1 


s 


£ 


0 


1 


0 


"-1" 


'O' 

0 

and 

1 

1 _ 


_ 0 _ 


are linearly independent (why?), these vectors form a basis for the eigenspace corre- 
sponding to X — 2. 
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Eigenvalues and 
Invertibility 


More on the Equivalence 
Theorem 


If X — 1, then (6) becomes 


" 1 0 2' 


" x l" 


~0' 

-1 -1 -1 


x 2 

= 

0 

0 

1 

K> 


_ x 3_ 


_0_ 


Solving this system yields (verify) 

Xl = —2s, X 2 = s, X3 = s 

Thus, the eigenvectors corresponding to X = 1 are the nonzero vectors of the form 


—2s 


'-2' 


"-2" 

s 

= s 

1 

so that 

1 

s 


1 


1 


is a basis for the eigenspace corresponding to X = 1 . 


The next theorem establishes a relationship between the eigenvalues and the invertibility 
of a matrix. 


A square matrix A is invertible if and only if X — 0 is not an eigen- 
value of A. 


Proof Assume that A is an n x n matrix and observe first that 1 = 0 is a solution of the 
characteristic equation 

1" + C\X" 1 + • • • + c n = 0 

if and only if the constant term c n is zero. Thus, it suffices to prove that A is invertible 
if and only if c„ / 0. But 

det(17 — A) = X' 1 A ci A" * -{-••• 4 - c n 

or, on setting 1 = 0, 

det(-A) = c„ or (-1)” det(A) = c„ 

It follows from the last equation that det(A) = 0 if and only if c n = 0, and this in turn 
implies that A is invertible if and only if c„ ^ 0. 


► EXAMPLE 8 Eigenvalues and Invertibility 

The matrix A in Example 7 is invertible since it has eigenvalues 1=1 and 1 = 2, nei- 
ther of which is zero. We leave it for you to check this conclusion by showing that 
det(A) / 0. ◄ 


As our final result in this section, we will use Theorem 5.1.4 to add one additional part 
to Theorem 4.10.2. 
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Eigenvalues of General 
LinearTransformations 


CALCULUS REQUIRED 


In vector spaces of functions 
eigenvectors are commonly re- 
ferred to as eigenfunctions . 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax — 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

( d ) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b. 

(/) Ax = b has exactly one solution for every n x 1 matrix b. 

(g) det(A) ^ 0. 

(h) The column vectors of A are linearly independent. 

(/) The row vectors of A are linearly independent. 

( j ) The column vectors of A span R". 

(k) The row vectors of A span R". 

(/) The column vectors of A form a basis for R". 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n. 

(o) A has nullity 0. 

(p) The orthogonal complement of the null space of A is R". 

(q) The orthogonal complement of the row space of A is {0}. 

(r) The kernel ofT A is {0}. 

(,?) The range ofT A is R n . 

(?) T A is one-to-one. 

(u) X = 0 is not an eigenvalue of A. 


Thus far, we have only defined eigenvalues and eigenvectors for matrices and linear 
operators on R" . The following definition, which parallels Definition 1, extends this 
concept to general vector spaces. 


DEFINITION 2 If T: V — > V is a linear operator on a vector space V , then a nonzero 
vector x in V is called an eigenvector of T if T(x) is a scalar multiple of x; that is, 

T(x) = Xx 

for some scalar X. The scalar X is called an eigenvalue of T, and x is said to be an 
eigenvector corresponding to X. 


As with matrix operators, we call the kernel of the operator XI — A the eigenspace of 
T corresponding to X. Stated another way, this is the subspace of all vectors in V for 
which T(x) — Xx. 


EXAMPLE 9 Eigenvalue of a Differentiation Operator 

If D:C"— >-C“ is the differentiation operator on the vector space of functions with 
continuous derivatives of all orders on the interval (— °°, oo), and if A is a constant, then 

D(e' x ) = Xe lx 

so that X is an eigenvalue of D and e Xx is a corresponding eigenvector. 
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Exercise Set 5.1 


In Exercises 1-4, confirm by multiplication that x is an eigen- 
vector of A , and find the corresponding eigenvalue. 


1 . A = 


1' 

-1 


2. A = 


'5 -1' 

1 3 


3. A = 


4 

2 

1 


0 1 
3 2 
0 4 


x = 


1 

2 

1 



2 

-1 

-l" 


T 

4. A = 

-1 

2 

-1 

; x = 

l 


-1 

-1 

2 


l 


In each part of Exercises , find the characteristic equation, 
the eigenvalues, and bases for the eigenspaces of the matrix. 


In Exercises 15-16, find the eigenvalues and a basis for each 
eigenspace of the linear operator defined by the stated formula. 
[Suggestion: Work with the standard matrix for the operator.] 

15. T ( x , y) = fx + Ay, 2x + 3y) 

16. T (x, y, z) = (2 x — y — z, x — z, —x + y + 2 z) 

17. ( Calculus required) Let D 2 : C°°(— oo, oo) — > C°°(— oo, oo) be the 
operator that maps a function into its second derivative. 

fa) Show that D 2 is linear. 

(b) Show that if cu is a positive constant, then sin ^/oox and 
cos^/wx are eigenvectors of D 2 , and find their corre- 
sponding eigenvalues. 

18. ( Calculus required) Let D 2 : C°°— >C°° be the linear operator 
in Exercise 17. Show that if co is a positive constant, then 
sinh «Ju>x and cosh «Jmx are eigenvectors of D 1 , and find their 
corresponding eigenvalues. 



'1 

4" 


'-2 



7" 

5. (a) 

2 

3_ 

(b) 

1 


2_ 


'1 

O' 


'1 

-2' 


(c) 

_0 

1_ 

(d) 

_0 

1_ 



'2 

r 


'2 

-3' 


6. (a) 

_1 

2 

(b) 

_0 

2 



'2 

O' 


1 


2' 

(c) 

0 

2 

(d) 

-2 

_ 

1 


In Exercises 12, find the characteristic equation, the eigen- 
values, and bases for the eigenspaces of the matrix. 



4 

0 

l" 



1 


0 


7. 

-2 

1 

0 


8. 

0 


0 



-2 

0 

1 



-2 


0 



'6 

3 



3" 


"o 

1 

l" 


9. 

0 

-2 

0 

10. 

1 

0 

1 



1 

0 


3 


1 

1 

0 



"4 

0 


l" 


"l 

-3 


3~ 

11. 

0 

3 

0 

12. 

3 

-5 


3 


1 

0 


2 


6 

-6 


4 


In Exercises , find the characteristic equation of the 
matrix by inspection. 


13 . 


3 

-2 

4 


0 0 

7 0 

8 1 


9 -8 
0 -1 
0 0 
0 0 


6 3 

0 0 
3 0 

0 7 


In each part of Exercises 19-20, find the eigenvalues and the 
corresponding eigenspaces of the stated matrix operator on R 2 . 
Refer to the tables in Section 4.9 and use geometric reasoning to 
find the answers. No computations are needed. 

19. fa) Reflection about the line y = x. 

(b) Orthogonal projection onto the x-axis. 

(c) Rotation about the origin through a positive angle of 90°. 

(d) Contraction with factor k (0 < k < 1). 

(e) Shear in the x-direction by a factor k (k 0). 

20. (a) Reflection about the y-axis. 

(b) Rotation about the origin through a positive angle of 180°. 

(c) Dilation with factor k (k > 1). 

(d) Expansion in the y-direction with factor k ( k > 1). 

(e) Shear in the y-direction by a factor k (k ^ 0). 

In each part of Exercises 21-22, find the eigenvalues and the 
corresponding eigenspaces of the stated matrix operator on R 3 . 
Refer to the tables in Section 4.9 and use geometric reasoning to 
find the answers. No computations are needed. 

21. fa) Reflection about the xy-plane. 

(b) Orthogonal projection onto the xz-plane. 

(c) Counterclockwise rotation about the positive x-axis 
through an angle of 90°. 

(d) Contraction with factor k (0 < k < 1). 

22. fa) Reflection about the xz-plane. 

(b) Orthogonal projection onto the yz-plane. 

(c) Counterclockwise rotation about the positive y-axis 
through an angle of 180°. 

(d) Dilation with factor k (k > 1). 
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23. Let A be a 2 x 2 matrix, and call a line through the origin of 
R 2 invariant under A if Ax lies on the line when x does. Find 
equations for all lines in R 2 , if any, that are invariant under 
the given matrix. 


(a) A = 


'4 

2 



(b) A = 


r 

o 


24. Find det(A) given that A has p(X) as its characteristic poly- 
nomial. 

(a) p(X) = X 3 -2X 2 + X + 5 

(b) p{X) = X 4 — X 3 + 1 

[Hint: See the proof of Theorem 5.1.4.] 

25. Suppose that the characteristic polynomial of some matrix A 
is found to be p(X) — (X — 1)(X — 3) 2 (X — 4) 3 . In each part, 
answer the question and explain your reasoning. 

(a) What is the size of A? 

(b) Is A invertible? 


30. Let A be the matrix in Exercise 29. Show that if b ^ 0, then 


X] 


' -b 
ci — L] 


and x 2 


' -b ' 
ci — X2 


are eigenvectors of A that correspond, respectively, to the 
eigenvalues 


*1 = 1 

[(a + d) + y/(a - d) 2 + Abe J 

*2=5 

^(a + d) - y/(a - d) 1 + 46c J 


31. Use the result of Exercise 28 to prove that if 
p(X) = X~ 4- ciA. 4- C 2 

is the characteristic polynomial of a 2 x 2 matrix, then 
p(A) = A - T ci A T C 2 I = 0 

(Stated informally, A satisfies its characteristic equation. This 
result is true as well for n x n matrices.) 


(c) How many eigenspaces does A have? 

26. The eigenvectors that we have been studying are sometimes 
called right eigenvectors to distinguish them from left eigen- 
vectors , which are n x 1 column matrices x that satisfy the 
equation x^ = p\ T for some scalar p. For a given matrix A, 
how are the right eigenvectors and their corresponding eigen- 
values related to the left eigenvectors and their corresponding 
eigenvalues? 


32. Prove: If a, b, c, and d are integers such that a + b = c + d, 
then 


has integer eigenvalues. 

33. Prove: If X is an eigenvalue of an invertible matrix A and x is 
a corresponding eigenvector, then l/X is an eigenvalue of A -1 
and x is a corresponding eigenvector. 


27. Find a 3 x 3 matrix A that has eigenvalues 1,-1, and 0, and 
for which 


f 


T 


1" 

-1 

, 

1 

, 

-1 

1 


0 


0 


are their corresponding eigenvectors. 

Working with Proofs 

28. Prove that the characteristic equation of a 2 x 2 matrix A can 
be expressed as X 2 — tr(A)L + det(A) = 0, where tr(A) is the 
trace of A. 

29. Use the result in Exercise 28 to show that if 


then the solutions of the characteristic equation of A are 

X = \ |\a + d) ± yj(a — d) 2 + 4 be J 

Use this result to show that A has 

(a) two distinct real eigenvalues if (a — d) 2 + Abe > 0. 

(b) two repeated real eigenvalues if (a — d) 2 + 4 be = 0. 

(c) complex conjugate eigenvalues if (a — d) 2 + Abe < 0. 


34. Prove: If X is an eigenvalue of A, x is a corresponding eigen- 
vector, and s is a scalar, then X — j is an eigenvalue of A — si 
and x is a corresponding eigenvector. 

35. Prove: If X is an eigenvalue of A and x is a corresponding 
eigenvector, then sX is an eigenvalue of jA for every scalar s 
and x is a corresponding eigenvector. 


36. Find the eigenvalues and bases for the eigenspaces of 


A = 


-2 

-2 

-4 


2 

3 

2 


3 

2 

5 


and then use Exercises 33 and 34 to find the eigenvalues and 
bases for the eigenspaces of 

(a) A- 1 (b) A - 3/ (c) A + 2/ 


37. Prove that the characteristic polynomial of an n x n matrix A 
has degree n and that the coefficient of X" in that polynomial 
is 1. 


38. (a) Prove that if A is a square matrix, then A and A T have 
the same eigenvalues. [Hint: Look at the characteristic 
equation det (XI — A) = 0.] 

(b) Show that A and A T need not have the same eigenspaces. 
[Hint: Use the result in Exercise 30 to find a 2 x 2 matrix 
for which A and A T have different eigenspaces.] 
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39. Prove that the intersection of any two distinct eigenspaces of 
a matrix A is {0}. 

True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 

(a) If A is a square matrix and Ax = Ax for some nonzero scalar 
A, then x is an eigenvector of A. 

(b) If A is an eigenvalue of a matrix A, then the linear system 
(A/ — A)x = 0 has only the trivial solution. 

(c) If the characteristic polynomial of a matrix A is 
p(A) = A 2 + 1, then A is invertible. 

(d) If A is an eigenvalue of a matrix A, then the eigenspace of A 
corresponding to A is the set of eigenvectors of A correspond- 
ing to A. 


Working with Technology 

Tl. For the given matrix A, find the characteristic polynomial 
and the eigenvalues, and then use the method of Example 7 to find 
bases for the eigenspaces. 



'-8 

33 

38 

173 

-30 


0 

0 

-1 

-4 

0 

A = 

0 

0 

-5 

-25 

1 


0 

0 

1 

5 

0 


4 

-16 

-19 

-86 

15 


T2. The Cayley-Hamilton Theorem states that every square ma- 
trix satisfies its characteristic equation; that is, if A is an n x n 
matrix whose characteristic equation is 

A + CjA" 4- • • • -f- c n = 0 

then A" + CiA" -1 + ■ ■ ■ + c„ = 0. 

(a) Verify the Cayley-Hamilton Theorem for the matrix 


(e) The eigenvalues of a matrix A are the same as the eigenvalues 
of the reduced row echelon form of A. 

(f ) If 0 is an eigenvalue of a matrix A, then the set of columns of 
A is linearly independent. 


A = 


'0 

0 

2 


1 

0 

-5 


O' 

1 

4 


(b) Use the result in Exercise 28 to prove the Cayley-Hamilton 
Theorem for 2 x 2 matrices. 


5.2 Diagonalization 

In this section we will be concerned with the problem of finding a basis for R" that consists 
of eigenvectors of an n x n matrix A. Such bases can be used to study geometric properties 
of A and to simplify various numerical computations. These bases are also of physical 
significance in a wide variety of applications, some of which will be considered later in this 
text. 


The Matrix Diagonalization Products of the form P _1 AP in which A and P are n x n matrices and P is invertible 

Problem will be our main topic of study in this section. There are various ways to think about 

such products, one of which is to view them as transformations 

A — > P _1 AP 

in which the matrix A is mapped into the matrix P _1 AP. These are called similarity 
transformations. Such transformations are important because they preserve many prop- 
erties of the matrix A. For example, if we let B = P l AP, then A and B have the same 
determinant since 

det(fi) = det(P _1 AP) = det(P _1 ) det(A) det(P) 

= . * det(A) det(P) = det(A) 
det(P) 
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In general, any property that is preserved by a similarity transformation is called a 
similarity invariant and is said to be invariant under similarity. Table 1 lists the most 
important similarity invariants. The proofs of some of these are given as exercises. 


Table Similarity Invariants 


Property 

Description 

Determinant 

A and P~ l AP have the same determinant. 

Invertibility 

A is invertible if and only if P~ l AP is invertible. 

Rank 

A and P~ l AP have the same rank. 

Nullity 

A and P~ l AP have the same nullity. 

Trace 

A and P~ l AP have the same trace. 

Characteristic polynomial 

A and P~ l AP have the same characteristic polynomial. 

Eigenvalues 

A and P~ l AP have the same eigenvalues. 

Eigenspace dimension 

If X is an eigenvalue of A (and hence of P~ l AP) then the eigenspace 
of A corresponding to X and the eigenspace of P~ l AP 
corresponding to X have the same dimension. 


We will find the following terminology useful in our study of similarity transforma- 
tions. 


DEFINITION 1 If A and B are square matrices, then we say that B is similar to A if 
there is an invertible matrix P such that B = P l AP. 


Note that if B is similar to A, then it is also true that A is similar to B since we can 
express A as A = Q~ l BQ by taking Q = P~ l . This being the case, we will usually say 
that A and B are similar matrices if either is similar to the other. 

Because diagonal matrices have such a simple form, it is natural to inquire whether 
a given n x n matrix A is similar to a matrix of this type. Should this turn out to be 
the case, and should we be able to actually find a diagonal matrix D that is similar to 
A, then we would be able to ascertain many of the similarity invariant properties of A 
directly from the diagonal entries of D. For example, the diagonal entries of D will 
be the eigenvalues of A (Theorem 5.1.2), and the product of the diagonal entries of D 
will be the determinant of A (Theorem 2.1.2). This leads us to introduce the following 
terminology. 


DEFINITION 2 A square matrix A is said to be diagonalizable if it is similar to some 
diagonal matrix; that is, if there exists an invertible matrix P such that P l AP is 
diagonal. In this case the matrix P is said to diagonalize A. 


The following theorem and the ideas used in its proof will provide us with a roadmap 
for devising a technique for determining whether a matrix is diagonalizable and, if so, 
for finding a matrix P that will perform the diagonalization. 
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Part ( b ) of Theorem 5.2.1 is 
equivalent to saying that there 
is a basis for R" consisting of 
eigenvectors of A. Why? 


[EM 5.2.1 If A is an n x n matrix, the following statements are equivalent. 

(a) A is diagonalizable. 

(b) A has n linearly independent eigenvectors. 


Proof [a) =>• (b) Since A is assumed to be diagonalizable, it follows that there exist an 
invertible matrix P and a diagonal matrix D such that P l A P — D or, equivalently, 

AP = PD (1) 

If we denote the column vectors of P by p lt p 2 , . . . , p„, and if we assume that the 
diagonal entries of D are X i , X 2 , , X n , then by Formula (6) of Section 1 . 3 the left side 
of (1) can be expressed as 

AP = A[p, p 2 ■ • ■ pj = [Apj Ap 2 • • ■ Ap„] 

and, as noted in the comment following Example 1 of Section 1.7, the right side of (1) 
can be expressed as 

PD = OiPj A 2 p 2 ■ ■ ■ ^«P«] 

Thus, it follows from (1) that 

Api = A. 1 P 1 , Ap 2 = A 2 p 2 , . . . , Ap„=A„p„ (2) 

Since P is invertible, we know from Theorem 5.1.5 that its column vectors pj, p 2 , . . . , p„ 
are linearly independent (and hence nonzero). Thus, it follows from (2) that these n 
column vectors are eigenvectors of A. 

Proof (b) => (a) Assume that A has n linearly independent eigenvectors. Pi , p 2 , . . . , p„, 
and that X\ , X 2 , . . . , X n are the corresponding eigenvalues. If we let 

P =[ Pi P2 • P„] 

and if we let D be the diagonal matrix that has X\,X 2 , ... ,X n as its successive diagonal 
entries, then 

AP = A[pj p 2 • • • p„] = [Apj Ap 2 • • • Ap„] 

= [AiPi A 2 p 2 ••• X n p n ]=PD 

Since the column vectors of P are linearly independent, it follows from Theorem 5.1.5 
that P is invertible, so that this last equation can be rewritten as P~ l AP = D. which 
shows that A is diagonalizable. 

Whereas Theorem 5.2.1 tells us that we need to find n linearly independent eigen- 
vectors to diagonalize a matrix, the following theorem tells us where such vectors might 
be found. Part (a) is proved at the end of this section, and part ( b ) is an immediate 
consequence of part (a) and Theorem 5.2.1 (why?). 


THEOREM 5.2.2 

(a) IfX i , k 2 , . . . , Xk are distinct eigenvalues of a matrix A, and if Vi , v 2 , . . . , v* are 
corresponding eigenvectors, then {vi , v 2 , . . . , v*} is a linearly independent set. 

(b) Ann x n matrix with n distinct eigenvalues is diagonalizable. 


Remark Part (o) of Theorem 5.2.2 is a special case of a more general result: Specifically, if 

A.!, X 2 X k are distinct eigenvalues, and if Si, S 2 , . ■ . , are corresponding sets of linearly 

independent eigenvectors, then the union of these sets is linearly independent. 
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Procedure for 
Diagonalizing a Matrix 


Theorem 5.2.1 guarantees that an n x n matrix A with n linearly independent eigen- 
vectors is diagonalizable, and the proof of that theorem together with Theorem 5.2.2 
suggests the following procedure for diagonalizing A. 


A Procedure for Diagonalizing an n x n Matrix 

Step 1. Determine first whether the matrix is actually diagonalizable by searching for 
n linearly independent eigenvectors. One way to do this is to find a basis for 
each eigenspace and count the total number of vectors obtained. If there is 
a total of n vectors, then the matrix is diagonalizable, and if the total is less 
than n, then it is not. 

Step 2. If you ascertained that the matrix is diagonalizable, then form the matrix 
P — [pj p 2 • • • p„] whose column vectors are the n basis vectors you ob- 
tained in Step 1 . 

Step 3. P~ l AP will be a diagonal matrix whose successive diagonal entries are the 
eigenvalues li, X 2 , . . . , X n that correspond to the successive columns of P. 


EXAMPLE 1 Finding a Matrix P That Diagonalizes a Matrix A 

Find a matrix P that diagonalizes 


A = 


"0 

1 

1 


0 - 2 " 
2 1 

0 3 


Solution In Example 7 of the preceding section we found the characteristic equation of 
A to be 

(1 - 1)(1 - 2) 2 = 0 
and we found the following bases for the eigenspaces: 



~-r 


'O' 


~-2" 

1 = 2: pj = 

0 

i_ 

- P 2 = 

1 

_0_ 

; 1=1: P 3 = 

1 

1_ 


There are three basis vectors in total, so the matrix 


P = 


-1 

0 

1 


0 - 2 ~ 
1 1 

0 1 


diagonalizes A. As a check, you should verify that 


(N 

0 

1 


'0 0 -2' 


1 1 1 


1 2 1 


-1 0 -1_ 


1 

O 



0 

- 2 ' 


'2 

0 

0 ~ 

1 

1 

= 

0 

2 

0 

0 

1 


0 

0 

1 


In general, there is no preferred order for the columns of P. Since the /th diagonal 
entry of P~ X AP is an eigenvalue for the /th column vector of P, changing the order of 
the columns of P just changes the order of the eigenvalues on the diagonal of P~ l AP . 
Thus, had we written 


P = 


-1 

0 

1 


-2 

1 

1 


0~ 

1 

0 
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in the preceding example, we would have obtained 


P~ l AP 


2 0 0 
0 1 0 
0 0 2 


EXAMPLE 2 A Matrix That Is Not Diagonalizable 

Show that the following matrix is not diagonalizable: 


A = 


1 0 0 
1 2 0 
-3 5 2 


Solution The characteristic polynomial of A is 

det(l/ - Al- 
so the characteristic equation is 


A — 1 0 0 

-1 1 — 2 0 
3 -5 1-2 


= (i - i)a - iy 


O- DO -2) 2 = 0 


and the distinct eigenvalues of A are 1=1 and 1 = 2. We leave it for you to show that 
bases for the eigenspaces are 



1 “ 

8 



'o' 

1=1: pj = 

1 

8 

; 1 = 2 

p 2 = 

0 


1 



1 


Since A is a 3 x 3 matrix and there are only two basis vectors in total, A is not diago- 
nalizable. 


Alternative Solution If you are concerned only in determining whether a matrix is di- 
agonalizable and not with actually finding a diagonalizing matrix P, then it is not nec- 
essary to compute bases for the eigenspaces — it suffices to find the dimensions of the 
eigenspaces. For this example, the eigenspace corresponding to 1 = 1 is the solution 
space of the system 


0 0 o" 


X\ 


‘o' 

-1 -1 0 


x 2 

= 

0 

3 -5 -1 


_*3_ 


0 


Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theo- 
rem 4.8.2, and hence the eigenspace corresponding to 1 = 1 is one-dimensional. 

The eigenspace corresponding to 1 = 2 is the solution space of the system 


1 0 o" 


Xi 


‘o' 

-10 0 


x 2 

= 

0 

1 

OJ 

1 

U\ 

o 

1 


x 3 


0 


This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corre- 
sponding to 1 = 2 is also one-dimensional. Since the eigenspaces produce a total of two 
basis vectors, and since three are needed, the matrix A is not diagonalizable. 
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Eigenvalues of Powers of a 
Matrix 


Note that diagonalizability is 
not a requirement in Theo- 
rem 5.2.3. 


► EXAMPLE 3 Recognizing Diagonalizability 


We saw in Example 3 of the preceding section that 


A = 


0 

0 

4 


1 

0 

-17 


0 

1 

8 


has three distinct eigenvalues: A. — 4, X — 2 + V3, and X = 2 — 
diagonalizable and 


P~ l AP 


4 0 0 

0 2 + V3 0 
0 0 2- -v/3 


Therefore, A is 


for some invertible matrix P. If needed, the matrix P can be found using the method 
shown in Example 1 of this section. 


► EXAMPLE 4 Diagonalizability of Triangular Matrices 


From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main 
diagonal. Thus, a triangular matrix with distinct entries on the main diagonal is diago- 
nalizable. For example, 


-12 4 0 

0 3 17 

A = 

0 0 5 8 

0 0 0 - 2 _ 

is a diagonalizable matrix with eigenvalues A| = — 1, A 2 = 3, A3 = 5, X 4 = —2. 


Since there are many applications in which it is necessary to compute high powers of a 
square matrix A, we will now turn our attention to that important problem. As we will 
see, the most efficient way to compute A k , particularly for large values of k, is to first 
diagonalize A. But because diagonalizing a matrix A involves finding its eigenvalues and 
eigenvectors, we will need to know how these quantities are related to those of A k . As an 
illustration, suppose that X is an eigenvalue of A and x is a corresponding eigenvector. 
Then 

A 2 x = A(Ax) = A(Ax) = A(Ax) = A(Ax) = X 2 x 
which shows not only that X 2 is a eigenvalue of A 2 but that x is a corresponding eigen- 
vector. In general, we have the following result. 


Ifk is a positive integer, X is an eigenvalue of a matrix A, and x is 
a corresponding eigenvector, then X k is an eigenvalue of A k and x is a corresponding 
eigenvector. 


► EXAMPLE 5 Eigenvalues and Eigenvectors of Matrix Powers 

In Example 2 we found the eigenvalues and corresponding eigenvectors of the matrix 


A = 


1 

1 

-3 


0 0 
2 0 
5 2 


Do the same for A 7 . 
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Solution We know from Example 2 that the eigenvalues of A are 1=1 and 1 — 2, so 
the eigenvalues of A 7 are X = l 7 = 1 and X = 2 7 = 128. The eigenvectors Pj and p 2 
obtained in Example 1 corresponding to the eigenvalues X = 1 and X = 2 of A are also 
the eigenvectors corresponding to the eigenvalues 1=1 and 1 = 128 of A 7 . 


Computing Powers of a 
Matrix 


Formula (3) reveals that rais- 
ing a diagonalizable matrix A 
to a positive integer power has 
the effect of raising its eigen- 
values to that power. 


The problem of computing powers of a matrix is greatly simplified when the matrix is 
diagonalizable. To see why this is so, suppose that A is a diagonalizable n x n matrix, 
that P diagonalizes A, and that 

"l! 0 • • ■ 0 

0 1 2 0 


P~ X AP = 

0 0 

Squaring both sides of this equation yields 


= D 


(P~ 1 AP) 2 = 


0 X\ 


0 

0 

1? 


= D 2 


0 0 

We can rewrite the left side of this equation as 

( P~ l AP ) 2 = P~ X APP~ X AP = P~ l AIAP = P~ l A 2 P 

from which we obtain the relationship P~ l A 2 P = D 2 . More generally, if A: is a positive 
integer, then a similar computation will show that 


p-'A k P = D k = 


r X\ 0 

0 x k 


0 0 


which we can rewrite as 


n* 


A k = PD k P~ l = P 


X\ 0 
0 X k 


0 0 


P” 1 


( 3 ) 


► EXAMPLE 6 Powers of a Matrix 


Use (3) to find A 13 , where 



0 -2 
2 1 

0 3 


Solution We showed in Example 1 that the matrix A is diagonalized by 


and that 



0 -2 

1 1 

0 1 


D = P~ l AP 


2 0 0 
0 2 0 
0 0 1 


Geometric and Algebraic 
Multiplicity 
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Thus, it follows from (3) that 



"-1 

0 

—2 


~2 13 

0 

0 


1 

0 

2 

A 13 = PD U P~ X = 

0 

1 

1 


0 

2 13 

0 


1 

1 

1 


1 

0 

1 


0 

0 

l 13 


-1 

0 

-1 


8190 

0 

-16382 

8191 

8192 

8191 

8191 

0 

16383 


(4) 


Remark With the method in the preceding example, most of the work is in diagonalizing A. 
Once that work is done, it can be used to compute any power of A. Thus, to compute A 1000 we 
need only change the exponents from 13 to 1000 in (4). 


Theorem 5.2.2(b) does not completely settle the diagonalizability question since it only 
guarantees that a square matrix with n distinct eigenvalues is diagonalizable; it does not 
preclude the possibility that there may exist diagonalizable matrices with fewer than n 
distinct eigenvalues. The following example shows that this is indeed the case. 


► EXAMPLE 7 The Converse of Theorem 5.2.2(b) Is False 

Consider the matrices 



"l 

0 

o' 


"l 

1 

o' 

I = 

0 

1 

0 

and J = 

0 

1 

1 


0 

0 

1 


0 

0 

1 


It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigen- 
value, namely A = 1, and hence only one eigenspace. We leave it as an exercise for you 
to solve the characteristic equations 

(AT - 7)x = 0 and (XI - J)\ = 0 

with k = 1 and show that for I the eigenspace is three-dimensional (all of R 3 ) and for J 
it is one-dimensional, consisting of all scalar multiples of 

T 

x= 0 
0 

This shows that the converse of Theorem 5.2.2(b) is false, since we have produced two 
3x3 matrices with fewer than three distinct eigenvalues, one of which is diagonalizable 
and the other of which is not. M 


A full excursion into the study of diagonalizability is left for more advanced courses, 
but we will touch on one theorem that is important for a fuller understanding of diago- 
nalizability. It can be proved that if Ao is an eigenvalue of A , then the dimension of the 
eigenspace corresponding to A 0 cannot exceed the number of times that A — A 0 appears 
as a factor of the characteristic polynomial of A. For example, in Examples 1 and 2 the 
characteristic polynomial is 

(A - 1) (A - 2) 2 

Thus, the eigenspace corresponding to A = 1 is at most (hence exactly) one-dimensional, 
and the eigenspace corresponding to A = 2 is at most two-dimensional. In Example 1 
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the eigenspace corresponding to X — 2 actually had dimension 2, resulting in diagonal- 
izability, but in Example 2 the eigenspace corresponding to X — 2 had only dimension 1 , 
resulting in nondiagonalizability. 

There is some terminology that is related to these ideas. If X 0 is an eigenvalue of an 
n x n matrix A , then the dimension of the eigenspace corresponding to Lo is called the 
geometric multiplicity of X t) . and the number of times that X — X 0 appears as a factor in 
the characteristic polynomial of A is called the algebraic multiplicity of /,()• The following 
theorem, which we state without proof, summarizes the preceding discussion. 


Geometric and Algebraic Multiplicity 

If A is a scpiare matrix , their. 

(a) For every eigenvalue of A , the geometric multiplicity is less than or equal to the 
algebraic multiplicity. 

(b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is 
equal to the algebraic multiplicity. 


We will complete this section with an optional proof of Theorem 5.2.2(a). 


Proof of Theorem 5.2.2(a) Let Vi , v 2 , . . . , \k be eigenvectors of A corresponding to dis- 
tinct eigenvalues X\,X 2 , . . . ,Xk. We will assume that vi, V 2 , . . . , v* are linearly depen- 
dent and obtain a contradiction. We can then conclude that vi, v 2 , . ■ . , v* are linearly 
independent. 

Since an eigenvector is nonzero by definition, {vi } is linearly independent. Let r 
be the largest integer such that {vi, v 2 , . . . , v r } is linearly independent. Since we are 
assuming that {vi, \ 2 , . . . , v*.} is linearly dependent, r satisfies 1 < r < k. Moreover, 
by the definition of r, {vi, v 2 , . . . , v r+ i} is linearly dependent. Thus, there are scalars 
ci,c 2 , , c r+ 1 , not all zero, such that 

CIV! + c 2 x 2 H b c r+ iV r+ i = 0 (5) 

Multiplying both sides of (5) by A and using the fact that 

Avi=A.iVi, Av 2 = X 2 x 2 , . . . , Av r+ i = X r+ iv r+ i 


we obtain 


ciliVi + c 2 X 2 x 2 + • • • + c r +\X r+ \y r+ \ = 0 


( 6 ) 


If we now multiply both sides of (5) by X r+ \ and subtract the resulting equation from (6) 
we obtain 


ci(Li — A. r+ i)vi + c 2 (X 2 — A. r+ i)v 2 + • • • + c,(X r — L, + i)v,. = 0 


Since {vi, v 2 , . . . , v, } is a linearly independent set, this equation implies that 


Ci (A! — A. r+1 ) — c 2 {X 2 — A, r+ i) — • • • — c r (X r — ^r+i) — 0 


and since A.i, X 2 , . . . , X r+ \ are assumed to be distinct, it follows that 

ci = c 2 = • • • = c r = 0 (7) 

Substituting these values in (5) yields 


c r + iv r+ i = 0 
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Since the eigenvector v r+ i is nonzero, it follows that 

c r + 1 = 0 (8) 

But equations (7) and (8) contradict the fact that a,C 2 , c r+ \ are not all zero so the 
proof is complete. 


Exercise Set 5.2 


In Exercises 1-4, show that A and B are not similar matrices. 


1. A = 

'1 

_3 

r 

2_ 

, B 


'1 

.3 

1 1 

O <N 

1 

II 

'4 

2 


1 

, B 

4 

1 1 

1 1 

K> 4^ 

r 

4_ 


"l 

2 

3' 



' 1 

II 

0 

1 

2 

, B = 

1 

2 


0 

0 

1 



0 


"l 

0 

f 



"l 

4. A = 

2 

0 

2 

, B = 

2 


3 

0 

3 



0 


0 

0 

1 

o' 

0 

1 



'-1 


4 

-2 


'19 


-9 

-6 

11. A = 

-3 


4 

0 

II 

c4 

25 


-11 

-9 


-3 


1 

3 


17 


-9 

-4 


"o 

0 

o' 



'5 

0 

o' 


II 

cn 

0 

0 

0 


II 

1 

5 

0 



3 

0 

1 



0 

1 

5 



In each part of Exercises 1 5—16, the characteristic equation of 
a matrix A is given. Find the size of the matrix and the possible 
dimensions of its eigenspaces. 

15. (a) (A - 1)(A + 3) (A — 5) = 0 
(b) A 2 (A- I) (A — 2) 3 = 0 

16. (a) A 3 (A 2 - 5A - 6) = 0 
(b) A 3 — 3A 2 + 3A — 1 = 0 


In Exercises , find a matrix P that diagonalizes A , and 
check your work by computing P~'AP. 


5. A = 


In Exercises 17-18, use the method of Example 6 to compute 
the matrix A 10 . 


1. A — 


9. Let 


1 

0 


6. 

A 


-14 

12 












6 

-1 




-20 

17 

II 

0 

3 




00 

II 

1 

0 











2 

-1 




-1 

2 





'2 

0 

-2" 




"l 0 

o' 












0 

3 

0 

8. 

A 

= 

0 1 

1 

19. Let 











0 

0 

3 




0 1 

1 





7 

-l" 



"l 

1 

l" 










A = 

0 

1 

0 

and 

a — 

0 

0 

1 




'4 

0 

f 





0 

15 

-2 



1 

0 

5 


A = 


(a) Find the eigenvalues of A. 

(b) For each eigenvalue A, find the rank of the matrix A I — A. 

(c) Is A diagonalizable? Justify your conclusion. 

10. Follow the directions in Exercise 9 for the matrix 
'3 0 O' 

0 2 0 

0 1 2 


Confirm that P diagonalizes A, and then compute A 11 . 
20. Let 


A = 


Confirm that P diagonalizes A, and then compute each of the 
following powers of A. 

(a) A 1000 (b) A~ 1000 (c) A 2301 (d) A' 2301 

21. Find A" if n is a positive integer and 


"l 

-2 

8' 


"l 

-4 

l' 

0 

-1 

0 

and P = 

1 

0 

0 

0 

0 

-1 


0 

1 

0 


In Exercises II -14, find the geometric and algebraic multiplic- 
ity of each eigenvalue of the matrix A, and determine whether A 
is diagonalizable. If A is diagonalizable, then find a matrix P that 
diagonalizes A, and find P~ l AP. 



-1 

2 

-1 


0 

-1 

3 
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22. Show that the matrices 



'l 

1 

f 


'3 

0 

o' 

A = 

1 

1 

1 

and B = 

0 

0 

0 


1 

1 

1 


0 

0 

0 


are similar. 

23. We know from Table 1 that similar matrices have the same 
rank. Show that the converse is false by showing that the 
matrices 



’l O’ 


’0 l" 

A = 

1 

O 

O 

1 

and B = 

1 

O 

O 

1 


have the same rank but are not similar. [Suggestion: If they 
were similar, then there would be an invertible 2x2 matrix P 
for which AP = PB. Show that there is no such matrix.] 

24. We know from Table 1 that similar matrices have the same 
eigenvalues. Use the method of Exercise 23 to show that the 
converse is false by showing that the matrices 



’l f 


’l 0" 

A = 

0 1 

and B = 

0 1 


have the same eigenvalues but are not similar. 


29. In the case where the matrix A in Exercise 28 is diagonalizable, 
find a matrix P that diagonalizes A. [Hint: See Exercise 30 of 
Section 5.1.] 

In Exercises 30-33, find the standard matrix A for the given lin- 
ear operator, and determine whether that matrix is diagonalizable. 
If diagonalizable, find a matrix P that diagonalizes A. 

30. T (xi , jc 2 ) = (2.ti — * 2 , x i + * 2 ) 

31 . T(xi,xt) = (—* 2 . — * 1 ) 

32. T(x u X 2 , * 3 ) = (8*1 + 3*2 — 4*3, —3*i + *2 + 3*3, 

4*! + 3*2) 

33. T (*i , * 2 , * 3 ) = (3*i , * 2 , *1 — * 2 ) 

34. If P is a fixed n x n matrix, then the similarity transformation 

A —¥■ P~ l AP 

can be viewed as an operator Sp(A ) = P~ l AP on the vector 
space M nn of n x n matrices. 

(a) Show that Sp is a linear operator. 

(b) Find the kernel of Sp. 

(c) Find the rank of S/>. 


25. If A, B, and C are n x n matrices such that A is similar to B 
and B is similar to C, do you think that A must be similar to 
C? Justify your answer. 

26. (a) Is it possible for an n x n matrix to be similar to itself? 

Justify your answer. 

(b) What can you say about an n x n matrix that is similar to 
0 nxn l Justify your answer. 

(c) Is is possible for a nonsingular matrix to be similar to a 
singular matrix? Justify your answer. 

27. Suppose that the characteristic polynomial of some matrix A 
is found to be p(X) = (A. — l)(k — 3) 2 (k — 4) 3 . In each part, 
answer the question and explain your reasoning. 

(a) What can you say about the dimensions of the eigenspaces 
of A? 

(b) What can you say about the dimensions of the eigenspaces 
if you know that A is diagonalizable? 

(c) If (vi, V2, V3} is a linearly independent set of eigenvectors 
of A, all of which correspond to the same eigenvalue of A, 
what can you say about that eigenvalue? 

28. Let 

a b 

A = 

c d 

Show that 

(a) A is diagonalizable if (a — d ) 2 + 4 be > 0. 

(b) A is not diagonalizable if (a — d) 2 + 4 be < 0. 

[Hint: See Exercise 29 of Section 5.1.] 


Working with Proofs 

35. Prove that similar matrices have the same rank and nullity. 

36. Prove that similar matrices have the same trace. 

37. Prove that if A is diagonalizable, then so is A* for every positive 
integer k. 

38. We know from Table 1 that similar matrices, A and 5, have 
the same eigenvalues. However, it is not true that those eigen- 
values have the same corresponding eigenvectors for the two 
matrices. Prove that if B = P~ { AP, and v is an eigenvector of 
B corresponding to the eigenvalue A, then P\ is the eigenvec- 
tor of A corresponding to A. 

39. Let A be an n x n matrix, and let q(A) be the matrix 

q(A) = a n A n -b a n _\A'' 1 -b • • • -b ci\A -b ctoI n 

(a) Prove that if B = P^'AP, then q(B ) = P~ l q(A)P . 

(b) Prove that if A is diagonalizable, then so is q(A). 

40. Prove that if A is a diagonalizable matrix, then the rank of A 
is the number of nonzero eigenvalues of A. 

41. This problem will lead you through a proof of the fact that 
the algebraic multiplicity of an eigenvalue of an 11 x n matrix 
A is greater than or equal to the geometric multiplicity. For 
this purpose, assume that A 0 is an eigenvalue with geometric 
multiplicity k. 

(a) Prove that there is a basis B = {ui, 112, . . . , u„) for R n 
in which the first k vectors of B form a basis for the 
eigenspace corresponding to 2. 0 - 
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(b) Let P be the matrix having the vectors in B as col- 
umns. Prove that the product AP can be expressed as 


AP = 


^■o h 
0 


X ' 
Y 


[Hint: Compare the first k column vectors on both sides.] 

(c) Use the result in part (b) to prove that A is similar to 


\ Q I k X 
0 Y 


and hence that A and C have the same characteristic poly- 
nomial. 

(d) By considering det (XI — C), prove that the charac- 
teristic polynomial of C (and hence A) contains the factor 
(X — Xo) at least k times, thereby proving that the algebraic 
multiplicity of X 0 is greater than or equal to the geometric 
multiplicity k. 


True-False Exercises 

TF. In parts (a) (i) determine whether the statement is true or 

false, and justify your answer. 

(a) An n x n matrix with fewer than n distinct eigenvalues is not 
diagonalizable. 

(b) An n x n matrix with fewer than n linearly independent eigen- 
vectors is not diagonalizable. 

(c) If A and B are similar n x n matrices, then there exists an 
invertible n x n matrix P such that PA = BP. 

(d) If A is diagonalizable, then there is a unique matrix P such 
that P~ l AP is diagonal. 


(g) If there is a basis for R" consisting of eigenvectors of an n x n 
matrix A, then A is diagonalizable. 

(h) If every eigenvalue of a matrix A has algebraic multiplicity 1, 
then A is diagonalizable. 

(i) If 0 is an eigenvalue of a matrix A, then A 2 is singular. 

Working with Technology 

Tl. Generate a random 4x4 matrix A and an invertible 4x4 
matrix P and then confirm, as stated in Table 1, that P~ l AP and 
A have the same 

(a) determinant. 

(b) rank. 

(c) nullity. 

(d) trace. 

(e) characteristic polynomial. 

(f) eigenvalues. 

T2. (a) Use Theorem 5.2.1 to show that the following matrix is 
diagonalizable. 


-13 

-60 

-60' 

10 

42 

40 

-5 

-20 

-18 


(b) Find a matrix P that diagonalizes A. 

(c) Use the method of Example 6 to compute A 10 , and check your 
result by computing A 10 directly. 

T3. Use Theorem 5.2.1 to show that the following matrix is not 
diagonalizable. 


(e) If A is diagonalizable and invertible, then A 1 is diagonaliz- 
able. 

(f) If A is diagonalizable, then A r is diagonalizable. 


-10 

11 

-6' 

-15 

16 

-10 

-3 

3 

-2 


5.3 Complex Vector Spaces 

Because the characteristic equation of any square matrix can have complex solutions, the 
notions of complex eigenvalues and eigenvectors arise naturally, even within the context of 
matrices with real entries. In this section we will discuss this idea and apply our results to 
study symmetric matrices in more detail. A review of the essentials of complex numbers 
appears in the back of this text. 


Review of Complex 
Numbers 


Recall that if z = a + bi is a complex number, then: 

Re(z) = a and Im(z) = b are called the real part of z and the imaginary part of z, 
respectively, 

• |z| = \! a 1 + b 2 is called the modulus (or absolute value ) of z, 

• z = a — bi is called the complex conjugate of z. 
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. Z z = a 2 + b 2 = \z\ 2 , 

the angle (p in Figure 5.3.1 is called an argument of z, 
Re(z) = |z| cos (p 
Im(z) = |z| sin0 

• Z = |z| (cos (p + i sin cp) is called the polar form of z. 


Complex Eigenvalues 


In Formula (3) of Section 5.1 we observed that the characteristic equation of a general 
n x n matrix A has the form 


X n + ci A" 1 + ■ ■ • + c n — 0 (1) 

in which the highest power of X has a coefficient of 1. Up to now we have limited our 
discussion to matrices in which the solutions of (1) are real numbers. However, it is 
possible for the characteristic equation of a matrix A with real entries to have imaginary 
solutions; for example, the characteristic equation of the matrix 



is 


X A 2 
-5 


1 

X-2 


= U 2 + 1 = 0 


which has the imaginary solutions X — i and X = — i . To deal with this case we will need 
to explore the notion of a complex vector space and some related ideas. 


Vectors In C n A vector space in which scalars are allowed to be complex numbers is called a complex 
vector space. In this section we will be concerned only with the following complex 
generalization of the real vector space R n . 


DEFINITION 1 If n is a positive integer, then a complex n-tuple is a sequence of n 
complex numbers (ui, v 2 , , v„ ) . The set of all complex n-tuples is called complex 
n-space and is denoted by C" . Scalars are complex numbers, and the operations of 
addition, subtraction, and scalar multiplication are performed componentwise. 


The terminology used for n -tuples of real numbers applies to complex n -tuples with- 
out change. Thus,ifi>i, V 2 , . v n are complex numbers, then we call v = (ui, v 2 , .. . , v„ ) 
a vector in C" and j; | , v 2 , ... , v„ its components. Some examples of vectors in C 3 are 

u = (1 + i, —4 i, 3 + 2 i), v = (0, i, 5), w = (6 — sfl i, 9 + \i, ni) 

Every vector 

v = (ui, v 2 , v„) = (a i + b\i, a 2 + b 2 i, ...,«„+ b„i ) 
in C" can be split into real and imaginary parts as 

v = (ai, a 2 , . . . , a„) + i{b\,b 2 , ...,b n ) 
which we also denote as 

v = Re(v) + i Im(v) 
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where 
The vector 


Re(v) = (a i, a 2 , . . . , a„) and Im(v) = {b\, b 2 , . . . , b n ) 


\ = (v\,V 2 , . . . , v n ) — (a i - b\i, a 2 - b 2 i, ... ,a n - b n i) 
is called the complex conjugate of v and can be expressed in terms of Re(v) and Im(v) as 


v = (ai, a 2 , . . . , a n ) — i(b\, b 2 , . . . , b n ) — Re(v) — i Im(v) (2) 

It follows that the vectors in R n can be viewed as those vectors in C" whose imaginary 
part is zero; or stated another way, a vector v in C" is in R n if and only if v = v. 

In this section we will need to distinguish between matrices whose entries must be real 
numbers, called real matrices , and matrices whose entries may be either real numbers 
or complex numbers, called complex matrices. When convenient, you can think of a 
real matrix as a complex matrix each of whose entries has a zero imaginary part. The 
standard operations on real matrices carry over without change to complex matrices, 
and all of the familiar properties of matrices continue to hold. 

If A is a complex matrix, then Re(A) and Im(A) are the matrices formed from the 
real and imaginary parts of the entries of A , and A is the matrix formed by taking the 
complex conjugate of each entry in A . 


As you might expect, if A is 
a complex matrix, then A and 
A can be expressed in terms of 
Re(A) and Im(A) as 

A = Re(A) + i Im(A) 

A = Re(A) — i Im(A) 


► EXAMPLE 1 Real and Imaginary Parts of Vectors and Matrices 

Let 

1 + / — i 

v = (3 + i, —2 i, 5) and A = 

4 6 — 2 i 

Then 


v = (3 - i, 2i, 5), Re(v) = (3,0, 5), Im(v) = (1, -2, 0) 



1 — / i 


'1 O' 


'1 -r 

A = 

4 6 + 2/ 

, Re (A) = 

4 6 

, Im(A) = 

0 -2 


det(A) = 


1 + i 
4 


—i 

6-2 i 


(1 + 0(6 - 2 i) - (-i)(4) = 8 + 8/ ◄ 


Algebraic Properties of the 
Complex Conjugate 


The next two theorems list some properties of complex vectors and matrices that we will 
need in this section. Some of the proofs are given as exercises. 


!EM 5.3.1 Ifu and v are vectors in C" , and if k is a scalar, then: 

(a) U = u 

(b) ku = ku 

(c) u + v = u + v 
(i d ) u — v = u — v 


1EM 5.3.2 If A is an m x k complex matrix and B is a k x n complex matrix, 

then: 

(a) A = A 

(b) (AO = (A) r 

(c) AB = AB 
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The Complex Euclidean The following definition extends the notions of dot product and norm to C " . 

Inner Product 


The complex conjugates in 
(3) ensure that ||v|| is a real 
number, for without them the 
quantity v ■ v in (4) might be 
imaginary. 


As in the real case, we call v a unit vector in C n if ||v|| = 1, and we say two vectors u and 
v are orthogonal if u • v = 0. 

► EXAMPLE 2 Complex Euclidean Inner Product and Norm 

Find u • v, v • u, ||u||, and ||v|| for the vectors 

u = (1 + i, i, 3 — i) and v = (1 + i, 2, 4/) 

Solution 

u • v = (1 + /)(1 + i) + i( 2) + (3 — 0(40 = (1 + 0(1 — 0 + 2/ + (3 — i)(— 4/) = —2 — 10/ 

v • u = (1 + i')(T+7) + 2(0 + (4/) (3^7) = (1 + 0(1 - 0 - 2 i + 4/(3 + /) = -2 + 10/ 

||u|| = 7|1 + /| 2 + |/| 2 +|3-/| 2 = V2 + 1 + 10 = x/l3 

|| v|| = 7|1 + /| 2 + |2| 2 + |4/ 1 2 = V2 + 4+ 16 = V22 

Recall from Table 1 of Section 3.2 that if u and v are column vectors in R n , then their 
dot product can be expressed as 

T T 

u • v = u v = v u 

The analogous formulas in C n are (verify) 

u • v = u r v = v r u (5) 

Example 2 reveals a major difference between the dot product on R n and the complex 
dot product on C" . For the dot product on R n we always have v • u = u • v (the symmetry 
property), but for the complex dot product the corresponding relationship is given by 
u • v = v • u, which is called its antisymmetry property. The following theorem is an 
analog of Theorem 3.2.2. 


DEFINITION 2 If u = (u\, u 2 , . . ■ , u n ) and v = (vi, V 2 , ■ ■ . , v„) are vectors in C", 
then the complex Euclidean inner product of u and v (also called the complex dot 
product) is denoted by u • v and is defined as 

u • v = u\V\ + u 2 v 2 H b u„v„ (3) 

We also define the Euclidean norm on C" to be 

IMI = = Vluil 2 + |u 2 | 2 H h | 2 (4) 


1EM 5.3.3 If u, v, and w are vectors in C" , and if k is a scalar, then the complex 
Euclidean inner product has the following properties'. 


(a) u • v = v • u 

(b) u • (v + w) = u • v + u • w 

(c) k { u • v) = (&u) • v 

(d) u • k\ = k(u • v) 

(e) v • v > 0 and v • v = 0 if and only if v — 0. 


| Antisymmetry property ] 

| Distributive property ] 

| Homogeneity property ] 

| Antihomogeneity property | 
| Positivity property ] 
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Vector Concepts in C n 


Is R" a subspace of C"? Ex- 
plain. 


Parts (c) and ( d ) of this theorem state that a scalar multiplying a complex Euclidean 
inner product can be regrouped with the first vector, but to regroup it with the second 
vector you must first take its complex conjugate. We will prove part (d), and leave the 
others as exercises. 

Proof (d) 

k( u • v) = k(v • u) = k (v • u) = k (v • u) = (kv) • u = u • (k\) 

To complete the proof, substitute k for k and use the fact that k — k. 


Except for the use of complex scalars, the notions of linear combination, linear indepen- 
dence, subspace, spanning, basis, and dimension carry over without change to C" . 

Eigenvalues and eigenvectors are defined for complex matrices exactly as for real 
matrices. If A is an n x n matrix with complex entries, then the complex roots of the 
characteristic equation det(A7 — A) — 0 are called complex eigenvalues of A. As in the 
real case, A is a complex eigenvalue of A if and only if there exists a nonzero vector x in 
C" such that Ax = Ax. Each such x is called a complex eigenvector of A corresponding 
to A. The complex eigenvectors of A corresponding to A are the nonzero solutions of 
the linear system (A/ — A)x = 0, and the set of all such solutions is a subspace of C", 
called the complex eigenspace of A corresponding to A . 

The following theorem states that if a real matrix has complex eigenvalues, then those 
eigenvalues and their corresponding eigenvectors occur in conjugate pairs. 


1.3.4 Ifk is an eigenvalue of a real n x n matrix A, and if x is a corre- 
sponding eigenvector , then A is also an eigenvalue of A, and x is a corresponding 
eigenvector. 


Proof Since A is an eigenvalue of A and x is a corresponding eigenvector, we have 

Ax = Ax = Ax (6) 

However, A — A, since A has real entries, so it follows from part (c) of Theorem 5.3.2 
that 

Ax = Ax = Ax (7) 

Equations (6) and (7) together imply that 

Ax = Ax = Ax 

in which x ^ 0 (why?); this tells us that A is an eigenvalue of A and x is a corresponding 
eigenvector. 


► EXAMPLE 3 Complex Eigenvalues and Eigenvectors 

Find the eigenvalues and bases for the eigenspaces of 



Solution The characteristic polynomial of A is 

A + 2 1 

-5 A - 2 


= A 2 + 1 = (A — j)(A + i) 
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Eigenvalues and Eigenvectors 


so the eigenvalues of A are X = i and X = —i. Note that these eigenvalues are complex 
conjugates, as guaranteed by Theorem 5.3.4. To find the eigenvectors we must solve the 
system 


X T 2 
-5 



V 


'o' 


.* 2 . 


_ 0 _ 


with X — i and then with X — —i. With X = i, this system becomes 



V 


'O' 


. X 2. 


_0_ 


( 8 ) 


We could solve this system by reducing the augmented matrix 


i +2 1 O' 

-5 i- 2 0 


(9) 


to reduced row echelon form by Gauss-Jordan elimination, though the complex arith- 
metic is somewhat tedious. A simpler procedure here is first to observe that the reduced 
row echelon form of (9) must have a row of zeros because (8) has nontrivial solutions. 
This being the case, each row of (9) must be a scalar multiple of the other, and hence the 
first row can be made into a row of zeros by adding a suitable multiple of the second row 
to it. Accordingly, we can simply set the entries in the first row to zero, then interchange 
the rows, and then multiply the new first row by — 2 to obtain the reduced row echelon 
form 

-i f -i« o- 
_0 0 0 

Thus, a general solution of the system is 

x\ = (— | + ji) t, x 2 = t 


This tells us that the eigenspace corresponding to X = i is one-dimensional and consists 
of all complex scalar multiples of the basis vector 


-2 , i i 

5 ‘ 5 l 


(10) 


As a check, let us confirm that Ax — ix. We obtain 
Ax = 


'-2 -1" 

r _2 + in 

5 ' 5 




” 1 

5 

5 2_ 

1 


. 5 (~f + yO + 2 . 


i 


We could find a basis for the eigenspace corresponding to X = 
the work is unnecessary since Theorem 5.3.4 implies that 


-2 _ Li 

5 5 1 


-i in a similar way, but 


(ID 


must be a basis for this eigenspace. The following computations confirm that x is an 
eigenvector of A corresponding to X — —i : 


Ax = 


-2 

5 



jO-1" 

_5(— f— 10+2_ 


. 1+2 
S' s 1 
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Since a number of our subsequent examples will involve 2x2 matrices with real 
entries, it will be useful to discuss some general results about the eigenvalues of such 
matrices. Observe first that the characteristic polynomial of the matrix 


A = 


a 


c 


b 

d 


is 


del (a/ - A) = 


X — a 
—c 


-b 

X-d 


( X — a)(X — d) — be = X 2 — ( a + d)X + (ad — be) 


We can express this in terms of the trace and determinant of A as 

det(A7 — A) = X 2 — tr(A)A + det(A) (12) 


from which it follows that the characteristic equation of A is 


X 2 — tr(A)A + det(A) = 0 


(13) 


Now recall from algebra that if ax 2 + bx + c = 0 is a quadratic equation with real 
coefficients, then the discriminant b 2 — 4ac determines the nature of the roots: 

b 2 — 4ac > 0 | Two distinct real roots | 

h — 4a C = 0 | One repeated real root | 

b' — 4a C < 0 | Two conjugate imaginary roots | 

Applying this to (13) with a = l,b = — tr(A), and c = det(A) yields the following 
theorem. 


If A is a 2x2 matrix with real entries, then the characteristic equa- 
tion of A is X 2 — tr(A)A + det(A) = 0 and 

(a) A has two distinct real eigenvalues if tr(A) 2 — 4 det(A) > 0; 

(b) A has one repeated real eigenvalue if tr(A) 2 — 4 det(A) = 0; 

(c) A has two complex conjugate eigenvalues if tr(A) 2 — 4 det(A) < 0. 


► EXAMPLE 4 Eigenvalues of a 2 x 2 Matrix 

In each part, use Formula (13) for the characteristic equation to find the eigenvalues of 



' 2 2' 


'o -r 


' 2 3' 

(a) A = 

-1 5 

(b) A = 

1 2 

(c) A = 

-3 2 



Olga Taussky-Todd 
(1906-1995) 


OlgaTaussky-Todd was one of the pioneering women 
in matrix analysis and the first woman appointed to the faculty at the 
California Institute ofTechnology. She worked at the National Physical 
Laboratory in London during World War II, where she was assigned 
to study flutter in supersonic aircraft. While there, she realized that 
some results about the eigenvalues of a certain 6x6 complex matrix 
could be used to answer key questions about the flutter problem that 
would otherwise have required laborious calculation. AfterWorldWar II 
OlgaTaussky-Todd continued her work on matrix-related subjects and 
helped to draw many known but disparate results about matrices into 
the coherent subject that we now call matrix theory. 

[Image: Courtesy of the Archives, California Institute ofTechnology] 
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Solution (a) We have tr(A) = 7 and det( A) = 12, so the characteristic equation of A is 

A 2 - 71+ 12 = 0 

Factoring yields (A — 4) (A — 3) = 0, so the eigenvalues of A are A = 4 and A = 3. 
Solution (b) We have tr(A) = 2 and det(A) = 1, so the characteristic equation of A is 

A 2 - 2A + 1 = 0 

Factoring this equation yields (A — l) 2 = 0, so A = 1 is the only eigenvalue of A; it has 
algebraic multiplicity 2. 

Solution (c) We have tr(A) = 4 and det(A) = 13, so the characteristic equation of A is 

A 2 -4A+ 13 = 0 

Solving this equation by the quadratic formula yields 

4 ± ^(-4)--4 (1 3) ^ 4±V=36 
2 2 

Thus, the eigenvalues of A are A = 2 + 3/ and A = 2 — 3 i. 

Symmetric Matrices Have Our next result, which is concerned with the eigenvalues of real symmetric matrices, is 
Real Eigenvalues important in a wide variety of applications. The key to its proof is to think of a real 
symmetric matrix as a complex matrix whose entries have an imaginary part of zero. 


[EM 5.3.6 If A is a real symmetric matrix , then A has real eigenvalues. 


Proof Suppose that A is an eigenvalue of A and x is a corresponding eigenvector, where 
we allow for the possibility that A is complex and x is in C" . Thus, 

Ax = Ax 

where x / 0. If we multiply both sides of this equation by x T and use the fact that 
x T Ax = x r (Ax) = A(x r x) = A(x • x) = A||x|| 2 


then we obtain 


x r Ax 



Since the denominator in this expression is real, we can prove that A is real by showing 


that 

x t Ax = x r Ax 


(14) 


But A is symmetric and has real entries, so it follows from the second equality in (5) and 
properties of the conjugate that 

x t Ax = fAx = x t Ax = (Axfx = (Axfx = (Axfx = x r A r x = x r Ax 


A Geometric Interpretation The following theorem is the key to understanding the geometric significance of complex 
of Complex Eigenvalues eigenvalues of real 2x2 matrices. 
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The eigenvalues of the real matrix 

—b 


C = 


are X — a ± bi. If a and b are not both zero , then this matrix can be factored as 
—b 


W 0 
0 \k\ 


cos </> 
sin0 


- sin <p 
cos cp 


(15) 


(16) 


where (p is the angle from the positive x-axis to the ray that joins the origin to the 
point (a, b) ( Figure 5.3.2). 



Geometrically, this theorem states that multiplication by a matrix of form (15) can be 
viewed as a rotation through the angle </> followed by a scaling with factor |X| (Figure 
5.3.3). 

Proof The characteristic equation of C is (X — a) 2 + b 2 = 0 (verify), from which it 
follows that the eigenvalues of C are X = a ± bi. Assuming that a and b are not both 
zero, let </> be the angle from the positive x-axis to the ray that joins the origin to the 
point (a, b). The angle </> is an argument of the eigenvalue X = a + bi, so we see from 
Figure 5.3.2 that 

a = |A,|cos0 and b = \X\ sirup 
It follows from this that the matrix in (15) can be written as 


a b 


a —b 


>1 o' 


W W 


>1 o' 

cos (p — sin f 

b a 


.0 W. 


b a 


.o W. 

sirup cos (p_ 


LW WJ 


The following theorem, whose proof is considered in the exercises, shows that every 
real 2x2 matrix with complex eigenvalues is similar to a matrix of form (15). 


!EM 5.3.8 Let A be a real 2x2 matrix with complex eigenvalues X — a ± bi 
(where b 0). If x is an eigenvector of A corresponding to X = a — bi, then the 
matrix P — [Re(x) Im(x)] is invertible and 


A = 




(17) 


► EXAMPLE 5 A Matrix Factorization Using Complex Eigenvalues 

Factor the matrix in Example 3 into form (17) using the eigenvalue X = —i and the 
corresponding eigenvector that was given in (1 1). 


Solution For consistency with the notation in Theorem 5.3.8, let us denote the eigen- 
vector in ( 1 1) that corresponds to X = —i by x (rather than x as before). For this X and 
x we have 


2“ 



5 

1 

, Im(x) = 

5 

0_ 


(7 = 0, b = 1, Re(x) = 
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A Geometric Interpretation 
ofTheorem 5.3.8 


Power Sequences 


Thus, 


P — [Re(x) Im(x)] 
so A can be factored in form (17) as 



5 

0 


'-2 

-f 


” 2 

5 

1 “ 

5 

'0 -1" 

' o r 

5 

2 


1 

0_ 

.1 °. 

5 — 2_ 


You may want to confirm this by multiplying out the right side. 


To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right 
side of (16) by S and R<p, respectively, and then use (16) to rewrite (17) as 


A = PSR^P- 1 


"w 

o " 

'cos (p 

— sin (p 

0 


sirup 

COS (p_ 


(18) 


If we now view P as the transition matrix from the basis B — (Re(x), Im(x)} to the 
standard basis, then (18) tells us that computing a product Ax 0 can be broken down into 
a three-step process: 


Interpreting Formula (18) 

Step 1. Map x 0 from standard coordinates into B -coordinates by forming the product 
R"‘x o. 

Step 2. Rotate and scale the vector P _1 x 0 by forming the product SR,j, P 1 x (l . 

Step 3. Map the rotated and scaled vector back to standard coordinates to obtain 
Ax 0 = PSR'pP- 1 * 0 . 


There are many problems in which one is interested in how successive applications of a 
matrix transformation affect a specific vector. For example, if A is the standard matrix 
for an operator on R" and xo is some fixed vector in R n , then one might be interested in 
the behavior of the power sequence 


xq, Ax 0 , A'x 0 , . . . , A A x o, . . . 


For example, if 


then with the help of a computer or calculator one can show that the first four terms in 
the power sequence are 


A = 

" 1 

2 

3 

l 

m|^t- — > 

and x 0 = 

T 

1_ 


_ 5 

10 _ 



x 0 


1 

1 


Ax 0 = 


'1.25" 

0.5 


A 2 x 0 



" 0.35' 
-0.82 


With the help of MATLAB or a computer algebra system one can show that if the first 
100 terms are plotted as ordered pairs (x, y), then the points move along the elliptical 
path shown in Figure 5.3.4a. 
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To understand why the points move along an elliptical path, we will need to examine 
the eigenvalues and eigenvectors of A. We leave it for you to show that the eigenvalues 
of A are X — | ± | i and that the corresponding eigenvectors are 

A.i = 3 - | i: vi = + i, l) and X 2 = f + fi: v 2 = (j - i, l) 

If we take! = X x — 4 — andx = Vi = (| + i, l) in (17) and use the fact that |A.| = 1, 
then we obtain the factorization 


1 

2 

3 ” 

4 


" 1 

2 

f 


3 

5 

11 

10 _ 


1 

0 



A = P 



(19) 


where R ^ is a rotation about the origin through the angle (p whose tangent is 

sin0 3/5 3 , , 

tan cj) = = = - (<P — tan f ^ 36.9°) 

cos 0 4/5 4 4 ’ 

The matrix P in (19) is the transition matrix from the basis 


B = (Re(x), Im(x)} = {(£, l) , (1. 0)} 

to the standard basis, and P~ x is the transition matrix from the standard basis to the 
basis B (Figure 5.3.5). Next, observe that if n is a positive integer, then (19) implies that 


A n x o = (PfyP-'Vx o = PR;P~ l x o 


so the product A"x 0 can be computed by first mapping x 0 into the point P 1 x 0 in B- 
coordinates, then multiplying by R£ to rotate this point about the origin through the 
angle ncp, and then multiplying R£P~ l x 0 by P to map the resulting point back to stan- 
dard coordinates. We can now see what is happening geometrically: In B -coordinates 
each successive multiplication by A causes the point P~ l x 0 to advance through an angle 
< p , thereby tracing a circular orbit about the origin. However, the basis B is skewed (not 
orthogonal), so when the points on the circular orbit are transformed back to standard 
coordinates, the effect is to distort the circular orbit into the elliptical orbit traced by 
A"x 0 (Figure 5.3.46). Here are the computations for the first step (successive steps are 
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illustrated in Figure 5.3.4c): 


1 3“ 



2 4 


T 

3 11 


1 

5 10_ 




[xq is mapped to /^-coordinates.] 


[The point (l, is rotated through the angle (f>.\ 


[The point (^, l) is mapped to standard coordinates.] 


Exercise Set 5.3 

In Exercises -2, find IT, Re(u), Im(u), and ||u||. 

1. u = (2 - i, 4 i, 1+0 2. u = (6, 1 + 4 i, 6 - 2 ;') 

In Exercises (-4, show that u, v, and k satisfy Theorem 5.3.1. 

3. u = (3 — 4 T, 2 + —6 z), v = (1 + i, 2 — i, 4), k = i 

4. u = (6, 1 + 4 i, 6 - 20, v = (4, 3 + 2 i, i - 3), k = -i 

5. Solve the equation z'x — 3v = u for x, where u and v are the 
vectors in Exercise 3. 

6. Solve the equation (1 + z)x + 2u = v for x, where u and v are 
the vectors in Exercise 4. 

In Exercises 7-8, find A, Re(A), Im(A), det(A), and tr(A). 


7 . A = 


' -5 i 4 ' 

2 — i 1 + 5; 


8. A = 


'4 i 2- 3 

2 + 3; 1 


B = 


2 ; 


B = 


1 - 4i 


12. u = (1 + z, 4, 30, v = (3, -4 i, 2 + 30, 
w = (1 — z, 4;, 4 — 5;), k = 1 + z 


13. Compute (u • v) — w • u for the vectors u, v, and w in Exer- 
cise 1 1 . 

14. Compute (z'u • w) + (||u||v) • u for the vectors u, v, and w in 
Exercise 12. 

In Exercises 15-18, find the eigenvalues and bases for the 
eigenspaces of A . 


15. A = 


17. A = 


'4 -5' 

1 0 

'5 -2' 

1 3 


16. A = 


18. A = 


-1 -5' 

4 7 


-3 


9. Let A be the matrix given in Exercise 7, and let B be the matrix 

1 — i 


In Exercises 19-22, each matrix C has form (15). Theorem 
5.3.7 implies that C is the product of a scaling matrix with factor 
|A| and a rotation matrix with angle <j>. Find |A.| and cp for which 
—tc < <j> <Tt. 


Confirm that these matrices have the properties stated in The- 
orem 5.3.2. 

10. Let A be the matrix given in Exercise 8, and let B be the matrix 

5; 


19. C = 


21. C = 


1 -r 
1 1 


1 V3 
-V3 1 


20. C = 


22. C = 


' 0 5' 

-5 0 


V2 -J2. 

—s/2 -s/2 


Confirm that these matrices have the properties stated in The- 
orem 5.3.2. 

In Exercises 1 12, compute u • v, u • w, and v • w, and show 
that the vectors satisfy Formula (5) and parts (a), ( b ), and (c) of 
Theorem 5.3.3. 

11. u = (;, 2;, 3), v = (4, —2 i, 1 + ;), w = (2 — 2;, 5 + 3;), 

k = 2 i 


In Exercises 23-26, find an invertible matrix P and a matrix C 
of form (15) such that A = PCP~ l . 


23. A = 


25. A = 


-1 -5' 

4 7. 

8 6 ' 
-3 2 


24. A = 


26. A = 


A -5' 
1 0 


- 2 ' 

3 
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27. Find all complex scalars k , if any, for which u and v are or- 
thogonal in C 3 . 

(a) u = (21, i , 31), v = (1, 61, k) 

(b) u = (k, k, 1 + 1), v = (1, —1, 1 — 1) 

28. Show that if A is a real n x n matrix and x is a column vector 
in C", then Re(Ax) = A(Re(x)) and Im(Ax) = A(Im(x)). 


29. The matrices 




"0 f 
1 0 


a 2 


0 -f 

1 0 


&3 


"1 0 " 
0 -1 


called Pauli spin matrices , are used in quantum mechanics to 
study particle spin. The Dirac matrices , which are also used in 
quantum mechanics, are expressed in terms of the Pauli spin 
matrices and the 2x2 identity matrix I 2 as 



'0 

Cl' 

, a* = 

_C| 

0_ 



'0 

a{ 


'0 

03 ' 

a v = 



, ct z = 



y 

. ff 2 

0 _ 


_< t 3 

0 _ 


(a) Show that fi 2 = a 3 = a 2 = a 2 . 

(b) Matrices A and B for which AB = —BA are said to be 
anticommutatire. Show that the Dirac matrices are anti- 
commutative. 


30. If Ar is a real scalar and v is a vector in R " , then Theorem 3.2.1 
states that ||kv|| = |k|||v||. Is this relationship also true if k is 
a complex scalar and v is a vector in C"? Justify your answer. 


(a) For notational simplicity, let 



and let u = Re(x) and v = Im(x), so P = [u | v]. Show 
that the relationship Ax = Ax implies that 

Ax = (au + b\) + 1 (— bu + ay) 

and then equate real and imaginary parts in this equation 
to show that 

AP = [Au | Av] = [au + by \ —bu + av] = PM 

(b) Show that P is invertible, thereby completing the proof, 
since the result in part (a) implies that A = PMP~ l . 
[Hint: If P is not invertible, then one of its col- 
umn vectors is a real scalar multiple of the other, say 
v = cu. Substitute this into the equations Au = au + by 
and Av = — bu + av obtained in part (a), and show that 
(1 + c 2 )bu = 0. Finally, show that this leads to a contra- 
diction, thereby proving that P is invertible.] 

36. In this problem you will prove the complex analog of the 

Cauchy-Schwarz inequality. 

(a) Prove: If k is a complex number, and u and v are vectors 
in C n , then 

(u — ky) • (u — ky) = u • u — k(u • v) — k(u • v) + kk(y • v) 

(b) Use the result in part (a) to prove that 

0 < u • u — £(u • v) — k(u • v) + kk(y ■ v) 


Working with Proofs 

31. Prove part (c) of Theorem 5.3.1. 

32. Prove Theorem 5.3.2. 

33. Prove that if u and v are vectors in C" , then 

u-v= i||u + v|| 2 - -U|u-v || 2 
4 4 

+ ||u+ iv|| 2 - - ||u- ;v|| 2 

4 4 


34. It follows from Theorem 5.3.7 that the eigenvalues of the ro- 
tation matrix 


R(j> 


cos tp — sintp' 
sin tp cos tp 


are A = cos cj> ± i sin (f>. Prove that if x is an eigenvector cor- 
responding to either eigenvalue, then Re(x) and Im(x) are or- 
thogonal and have the same length. [Note: This implies that 
P = [Re(x) | Im(x)] is a real scalar multiple of an orthogonal 
matrix.] 

35. The two parts of this exercise lead you through a proof of 
Theorem 5.3.8. 


(c) Take k = (u • v)/(v • v) in part (b) to prove that 
|U-T| < ||U|| ||V|| 


True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 

(a) There is a real 5x5 matrix with no real eigenvalues. 

(b) The eigenvalues of a 2 x 2 complex matrix are the solutions 
of the equation A 2 — tr(A)A + det(A) = 0. 

(c) A 2 x 2 matrix A with real entries has two distinct eigenvalues 
if and only if tr(A) 2 5^ 4det(A). 

(d) If A is a complex eigenvalue of a real matrix A with a corre- 
sponding complex eigenvector v, then A is a complex eigen- 
value of A and v is a complex eigenvector of A corresponding 
to A. 

(e) Every eigenvalue of a complex symmetric matrix is real. 

(f ) If a 2 x 2 real matrix A has complex eigenvalues and x 0 is a 
vector in R 2 , then the vectors xo, Axo, A 2 xo, . . . , A' ! xo, ... lie 
on an ellipse. 
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5.4 Differential Equations 

Many laws of physics, chemistry, biology, engineering, and economics are described in 
terms of “differential equations” — that is, equations involving functions and their 
derivatives. In this section we will illustrate one way in which matrix diagonalization can be 
used to solve systems of differential equations. Calculus is a prerequisite for this section. 


Terminology Recall from calculus that a differential equation is an equation involving unknown func- 
tions and their derivatives. The order of a differential equation is the order of the highest 
derivative it contains. The simplest differential equations are the first-order equations 
of the form 

y' = ay (1) 

where y = fix ) is an unknown differentiable function to be determined, y' = dy/dx is 
its derivative, and a is a constant. As with most differential equations, this equation has 
infinitely many solutions; they are the functions of the form 

y = ce ax (2) 

where c is an arbitrary constant. That every function of this form is a solution of (1) 
follows from the computation 

y' = cae ax = ay 

and that these are the only solution is shown in the exercises. Accordingly, we call (2) the 
general solution of (1 ). As an example, the general solution of the differential equation 
y' = 5 y is 

y = ce 5x (3) 

Often, a physical problem that leads to a differential equation imposes some conditions 
that enable us to isolate one particular solution from the general solution. For example, 
if we require that solution (3) of the equation y' — 5 y satisfy the added condition 

y(0) = 6 (4) 

(that is, y — 6 when x — 0), then on substituting these values in (3), we obtain 
6 = ce° = c, from which we conclude that 

y = 6e 5x 

is the only solution y' = 5 y that satisfies (4). 

A condition such as (4), which specifies the value of the general solution at a point, 
is called an initial condition, and the problem of solving a differential equation subject 
to an initial condition is called an initial-value problem. 


First-Order Linear Systems 


In this section we will be concerned with solving systems of differential equations of the 
form 

y[ = a uy\ + onyi h — • + o-uyn 
y'l = <22134 + 0223 / 2 H + 0 2n y n 


y'n = a n \y\ + a n2 y i H b a nn y„ 

where yi = f\{x), y 2 = f 2 (x), ... ,y n = f n (x) are functions to be determined, and the 
a,/s are constants. In matrix notation, (5) can be written as 


>r 


All 

<212 

?2 

— 

<221 

<222 

.y'n. 


On 1 

«n2 


Cl\n 


y\ 

&2n 


yi 

nn_ 


.y». 
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or more briefly as 

y' = Ay (6) 

where the notation y' denotes the vector obtained by differentiating each component 
ofy. 

We call (5) or its matrix form (6) a constant coefficient first-order homogeneous linear 
system. It is of first order because all derivatives are of that order, it is linear because dif- 
ferentiation and matrix multiplication are linear transformations, and it is homogeneous 
because 

yi = yi = ■■■ = y n = o 

is a solution regardless of the values of the coefficients. As expected, this is called the 
trivial solution. In this section we will work primarily with the matrix form. Here is an 
example. 


EXAMPLE 1 Solution of a Linear System with Initial Conditions 

(a) Write the following system in matrix form: 

y'l = 3yi 

y' = -2y 2 (7) 

y'-i = 5v 3 

(b) Solve the system. 

(c) Find a solution of the system that satisfies the initial conditions y\ (0) = 1 , 
y 2 (0) = 4, and v 3 (0) = -2. 


Solution (a) 


or 


y[ 


1 

L>J 

o 

o | 


yi 

y'i 

= 

0-2 0 


y2 



0 0 5 


_J3_ 


3 0 
0 -2 
0 0 



( 8 ) 


(9) 


Solution ( b ) Because each equation in (7) involves only one unknown function, we can 
solve the equations individually. It follows from (2) that these solutions are 


yi = c\e 3x 

— 2x 

y 2 = c 2 e 
y 3 = c 2 e 5x 


or, in matrix notation, 


yi 


1 

m 

C 

1 



— 2x 

y 2 

— 

c^e 

_.V3_ 


c^e DX 


Solution (c) From the given initial conditions, we obtain 

1 = y, (0) = c\e° = ci 
4 = y 2 (0) = c 2 e° = c 2 
-2 = y 3 (0) = c 3 e° = c 3 


(10) 
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so the solution satisfying these conditions is 


yi = e 3x , y 2 = 4e- 2x , y 3 = - 2e 5x 


or, in matrix notation, 



yi 


1 

1 

y = 

y~ 

= 

4e~ 2x 


D 


1 

1 

K> 

05 

X 1 

1 


Solution by Diagonalization 


What made the system in Example 1 easy to solve was the fact that each equation involved 
only one of the unknown functions, so its matrix formulation, y' = Ay, had a diagonal 
coefficient matrix A [Formula (9)]. A more complicated situation occurs when some or 
all of the equations in the system involve more than one of the unknown functions, for 
in this case the coefficient matrix is not diagonal. Let us now consider how we might 
solve such a system. 

The basic idea for solving a system y' = Ay whose coefficient matrix A is not diagonal 
is to introduce a new unknown vector u that is related to the unknown vector y by an 
equation of the form y = Pu in which P is an invertible matrix that diagonalizes A. 
Of course, such a matrix may or may not exist, but if it does, then we can rewrite the 
equation y' — Ay as 

Pu' = A(Pu) 


or alternatively as 


u' = (P-'AP)u 


Since P is assumed to diagonalize A, this equation has the form 


u = Du 


where D is diagonal. We can now solve this equation for u using the method of Example 
1, and then obtain y by matrix multiplication using the relationship y = Pu. 

In summary, we have the following procedure for solving a system y' = Ay in the 
case were A is diagonalizable. 


A Procedure for Solving y' = Ay If A Is Diagonalizable 

Step 1. Find a matrix P that diagonalizes A. 

Step 2. Make the substitutions y = Pu and y' = Pu' to obtain a new “diagonal 
system” u' = Du, where D = P l AP. 

Step 3. Solve u' = Du. 

Step 4. Determine y from the equation y = Pu. 


► EXAMPLE 2 Solution Using Diagonalization 

(a) Solve the system 

y'l = yi + y 2 
y'l = 4.Vi - 2y 2 

(b) Find the solution that satisfies the initial conditions y\ (0) = 1, v 2 (0) = 6. 
Solution (a) The coefficient matrix for the system is 
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As discussed in Section 5.2, A will be diagonalized by any matrix P whose columns are 
linearly independent eigenvectors of A. Since 


det(A7 — A) — 


X - 1 
-4 


-1 
X T 2 


= X 2 + X-6 = (A. + 3XA.-2) 


the eigenvalues of A are X — 2 and X — — 3. By definition. 


x\ 

X2. 


is an eigenvector of A corresponding to X if and only if x is a nontrivial solution of 


'X - 1 -1 " 

~x{ 


'O' 

_ -4 A + 2_ 

_Xl_ 


_ 0 _ 


If X — 2, this system becomes 


' 1 

-1" 



r°i 

-4 

4_ 

1 

(N 

X 

1 


0 


Solving this system yields x\ = t,x 2 = t, so 


— 1 

X 

1 


1 

1 

f 

T 

U2J 




_ 1 _ 


Thus, 


Pi = 


is a basis for the eigenspace corresponding to X — 2. Similarly, you can show that 


p 2 = 


is a basis for the eigenspace corresponding to X — —3. Thus, 


P = 


4 

1 1 


D = P~ l AP = 


diagonalizes A, and 

'2 01 
.0 — 3 _ 

Thus, as noted in Step 2 of the procedure stated above, the substitution 

y — Pu and y' = Pu' 

yields the “diagonal system” 


u'=Z)u = 


'2 O' 

0 -3 


u\ — 2« 1 


u or 


From (2) the solution of this system is 

„2x 


u 1 = cie~ 
112 — c 2 e 


—3x 


or U : 


c\e 

c^e~ 
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so the equation y = Pu yields, as the solution for y, 


>r 


"i -r 


2x 

c\e LX 


' Cl e 2x - |c 2 e _3jc " 

.yi. 


i i 


— 

C2e 


c\e 2x + c 2 e- 3x 


2x 1 — 

y i = C\e - jC 2 e 

>’2 = c\e 2x + c 2 e~ 3x 

Solution [b) If we substitute the given initial conditions in (11), we obtain 

Cl ~\C2= 1 
Ci + c 2 = 6 


(ID 


Solving this system, we obtain ci = 2, c 2 = 4, so it follows from (11) that the solution 
satisfying the initial conditions is 

y\ = 2e lx - e~ 3x 
y 2 = 2e lx + 4e~ ix ◄ 


Remark Keep in mind that the method of Example 2 works because the coefficient matrix of 
the system is diagonalizable. In cases where this is not so, other methods are required. These are 
typically discussed in books devoted to differential equations. 


Exercise Set 5.4 

1. (a) Solve the system 

y[ = y\ + 4y 2 

y' 2 = 2vi + 3y 2 

(b) Find the solution that satisfies the initial conditions 

yi(0) = 0, y 2 (0) = 0. 

2. (a) Solve the system 

y'l = yi + 3y 2 

y' 2 = 4yi + 5y 2 

(b) Find the solution that satisfies the conditions Vi(0) = 2, 

y 2 (0) = 1. 

3. (a) Solve the system 

y[ = 4y, + y 3 

y'l = -2y i + y 2 
y'l = -2yi + y 3 

(b) Find the solution that satisfies the initial conditions 
Vi (0) = — 1, y 2 (0) = 1, y 3 (0) = 0. 

4. Solve the system 

y[ = 4vi + 2y 2 + 2v 3 
y 2 = 2yi + 4 y 2 + 2y 3 
y'l — 2yi + 2y 2 + 4y 3 


5. Show that every solution of y' = ay has the form y = ce ax . 
[Hint: Fet y = f{x) be a solution of the equation, and show 
that /( x)e~ ax is constant.] 


6. Show that if A is diagonalizable and 


yi 

y 2 

_}’n_ 


is a solution of the system y' = Ay, then each y t is a linear 
combination of e Xlx , e Xlx , . . . , e knX , where Xj, X 2 , . . . , X n are 
the eigenvalues of A. 


7. Sometimes it is possible to solve a single higher-order linear 
differential equation with constant coefficients by expressing 
it as a system and applying the methods of this section. For 
the differential equation y" — y' — 6y = 0, show that the sub- 
stitutions yj = v and y 2 = y' lead to the system 

y[ = yi 
y'l = 6yi + y 2 


Solve this system, and use the result to solve the original dif- 
ferential equation. 
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8. Use the procedure in Exercise 7 to solve y" + y' — 12 y = 0. 

9. Explain how you might use the procedure in Exercise 7 to solve 
y'" — 6y" + 1 ly' — 6y = 0. Use that procedure to solve the 
equation. 

10. Solve the nondiagonalizable system 

y[ = yi + y 2 
y'i = y 2 

[Hint: Solve the second equation for y 2 , substitute in the first 
equation, and then multiply both sides of the resulting equa- 
tion by e~ x .] 

11. Consider a system of differential equations y' = Ay, where A 
is a 2 x 2 matrix. For what values of flu, an, a 2 i, a 22 do the 
component solutions Vi(f), y 2 (t) tend to zero as t — »■ oo? In 
particular, what must be true about the determinant and the 
trace of A for this to happen? 

12. (a) By rewriting (11) in matrix form, show that the solution 

of the system in Example 2 can be expressed as 

y = c\e 2x j +c 2 e~ 3x * 

This is called the general solution of the system. 

(b) Note that in part (a), the vector in the first term is an 
eigenvector corresponding to the eigenvalue X\ =2, and 
the vector in the second term is an eigenvector correspond- 
ing to the eigenvalue X 2 = —3. This is a special case of the 
following general result: 


Theorem. If the coefficient matrix A of the system y' = Ay is 
diagonalizable, then the general solution of the system can be 
expressed as 

y = ae Xlx xi + c 2 e Xlx x 2 1- c n e x " x x„ 

where X \ , X 2 , . . . , X„ are the eigenvalues of A, and x,- is an eigen- 
vector of A corresponding to A; . 


13. The electrical circuit in the accompanying figure is called a 
parallel LRC circuit ; it contains a resistor with resistance 
R ohms (£2), an inductor with inductance L henries (H), and 
a capacitor with capacitance C farads (F). It is shown in elec- 
trical circuit analysis that at time t the current i 2 through the 
inductor and the voltage v c across the capacitor are solutions 
of the system 


i' L (t) 


0 

1 /L 

hit) 

v'clt) 


-!/C 

-1 /(RC) 

vc(t ) 


(a) Find the general solution of this system in the case where 
R = 1 ohm, L = 1 henry, and C = 0.5 farad. 


(b) Find ii(t) and Vc(t) subject to the initial conditions 
i a CO) = 2 amperes and u c (0) = 1 volt. 

(c) What can you say about the current and voltage in part (b) 
over the “long term” (that is, as t — > oo)? 


C 

|(- 

R 

* VW- 

L 

■» — omu- 


◄ Figure Ex-13 


In Exercises 14-15, a mapping 

L: C°°(— oo, oo) — >• C°°(— OO, oo) 

is given. 

(a) Show that L is a linear operator. 

(b) Use the ideas in Exercises 7 and 9 to solve the differential 
equation L(y) = 0. 

14. LOO = y" + ly' - 3y 

15. L{y) = y"' - ly" - / + ly 


Working with Proofs 

16. Prove the theorem in Exercise 12 by tracing through the four- 
step procedure preceding Example 2 with 


ki 0 

0 x 2 


0 

0 


and P = [x, | x 2 | ■ • • | x„] 


0 0 ■ • • /.„ 


True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 

false, and justify your answer. 

(a) Every system of differential equations y' = Ay has a solution. 

(b) If x' = Ax and y' = Ay, then x = y. 

(c) If x' = Ax and y' = Ay, then (cx + dy)' = A(cx + d y) for 
all scalars c and d. 

(d) If A is a square matrix with distinct real eigenvalues, then it is 
possible to solve x' = Ax by diagonalization. 

(e) If A and P are similar matrices, then y' = Ay and u' = P u 
have the same solutions. 
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Working with Technology 

Tl. (a) Find the general solution of the following system by com- 
puting appropriate eigenvalues and eigenvectors. 

y[ = 3yi + 2y 2 + 2 y 3 
y'i = yi + 4y 2 + y3 
y' 3 = ~ 2 y\ - 4 yi - % 

(b) Find the solution that satisfies the initial conditions yi (0) = 0, 
v 2 (0) = 1, y 3 (0) = —3. [Technology not required.] 

T2. It is shown in electrical circuit theory that for the LRC circuit 
in Figure Ex-13 the current I in amperes (A) through the inductor 


and the voltage drop V in volts (V) across the capacitor satisfy the 
system of differential equations 

dl _ V 
dt L 

dV _ I V 

dt C RC 

where the derivatives are with respect to the time t. Find I and 
V as functions of t if L = 0.5 H, C = 0.2 F, R — 2 £2, and the 
initial values of V and / are V(0) = IV and / (0) = 2 A. 


5.5 Dynamical Systems and Markov Chains 

In this optional section we will show how matrix methods can be used to analyze the 
behavior of physical systems that evolve over time. The methods that we will study here 
have been applied to problems in business, ecology, demographics, sociology, and most of 
the physical sciences. 

Dynamical Systems A dynamical system is a finite set of variables whose values change with time. The value 
of a variable at a point in time is called the state of the variable at that time, and the vector 
formed from these states is called the state vector of the dynamical system at that time. 
Our primary objective in this section is to analyze how the state vector of a dynamical 
system changes with time. Let us begin with an example. 



80% 90% 


Channel 1 loses 20% 
and holds 80%. 
Channel 2 loses 10% 
and holds 90%. 


▲ Figure 5.5.1 


► EXAMPLE 1 Market Share as a Dynamical System 

Suppose that two competing television channels, channel 1 and channel 2, each have 50% 
of the viewer market at some initial point in time. Assume that over each one-year period 
channel 1 captures 10% of channel 2’s share, and channel 2 captures 20% of channel Ls 
share (see Figure 5.5.1). What is each channel’s market share after one year? 


Solution Let us begin by introducing the time-dependent variables 

x\(t) = fraction of the market held by channel 1 at time t 
x 2 (f) = fraction of the market held by channel 2 at time t 


and the column vector 

<— Channel l’s fraction of the market at time t in years 
<— Channel 2’s fraction of the market at time t in years 


x(0 = 


x\ (f) 
* 2 ( 0 . 


The variables x\ ( t ) and x 2 (t) form a dynamical system whose state at time t is the vector 
x(f). If we take t — 0 to be the starting point at which the two channels had 50% of the 
market, then the state of the system at that time is 


x(0) = 


■*i(0)‘ 


'0.5' 

_x 2 (0)_ 


_0.5_ 


Channel l’s fraction of the market at time t = 0 
Channel 2’s fraction of the market at time t = 0 


0 ) 


Now let us try to find the state of the system at time t = 1 (one year later). Over the 
one-year period, channel 1 retains 80% of its initial 50%, and it gains 10% of channel 2’s 
initial 50%. Thus, 


Xl (l) = 0.8(0. 5) + 0.1(0. 5) = 0.45 


( 2 ) 
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Similarly, channel 2 gains 20% of channel l’s initial 50%, and retains 90% of its initial 
50%. Thus, 

x 2 (l) = 0.2(0. 5) + 0.9(0. 5) = 0.55 (3) 

Therefore, the state of the system at time t = 1 is 


x(l) = 


*1(1)' 


'0.45' 

*2(1). 


.0.55. 


< — Channel l’s fraction of the market at time t = 1 
<— Channel 2’s fraction of the market at time t = 1 


( 4 ) 


► EXAMPLE 2 Evolution of Market Share over Five Years 

Track the market shares of channels 1 and 2 in Example 1 over a five-year period. 

Solution To solve this problem suppose that we have already computed the market 
share of each channel at time t — k and we are interested in using the known values of 
x\(k ) and x 2 (k) to compute the market shares x\{k + 1) and x 2 (k + 1) one year later. 
The analysis is exactly the same as that used to obtain Equations (2) and (3). Over the 
one-year period, channel 1 retains 80% of its starting fraction x x (k) and gains 10% of 
channel 2’s starting fraction x 2 {k). Thus, 

x x {k + 1) = (0.8 )x t (k) + (0.1 )x 2 (k) (5) 


Similarly, channel 2 gains 20% of channel l’s starting fraction x x (k) and retains 90% of 
its own starting fraction x 2 {k). Thus, 


x 2 (k + 1) = (0.2).r!(yt) + (0.9 )x 2 (k) (6) 

Equations (5) and (6) can be expressed in matrix form as 


x\(k + 1) 


'0.8 

o.r 

"*t (k)~ 

_x 2 (k + 1)_ 


0.2 

0.9_ 

X-2 (k)_ 


which provides a way of using matrix multiplication to compute the state of the system 
at time t = k + 1 from the state at time t = k. For example, using (1 ) and (7) we obtain 


x(l) = 


' 0.8 

0.2 


0 . 1 ' 

0.9 


x(0) = 


' 0.8 

0.2 


which agrees with (4). Similarly, 


x(2) = 


0.8 

0.2 


0 . 1 ' 

0.9 


x(l) = 


0.8 

0.2 


o.r 

0.9_ 

o.r 

0.9 


'0.5' 


'0.45' 

.0.5. 


.0.55. 


'0.45' 


'0.415' 

0.55. 


.0.585. 


We can now continue this process, using Formula (7) to compute x(3) from x(2), then 
x(4) from x(3), and so on. This yields (verify) 


x(3) = 


'0.3905' 

0.6095 


x(4) = 


'0.37335' 

0.62665 


'0.361345' 

0.638655 


( 8 ) 


Thus, after five years, channel 1 will hold about 36% of the market and channel 2 will 
hold about 64% of the market. 


If desired, we can continue the market analysis in the last example beyond the five- 
year period and explore what happens to the market share over the long term. We did 
so, using a computer, and obtained the following state vectors (rounded to six decimal 
places): 


x(10) 


0.338041' 

0.661959 


x(20) 


'0.333466' 

0.666534 


x(40) 


'0.333333' 

0.666667 


( 9 ) 
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All subsequent state vectors, when rounded to six decimal places, are the same as x(40), 
so we see that the market shares eventually stabilize with channel 1 holding about one- 
third of the market and channel 2 holding about two-thirds. Later in this section, we 
will explain why this stabilization occurs. 


In many dynamical systems the states of the variables are not known with certainty but 
can be expressed as probabilities; such dynamical systems are called stochastic processes 
(from the Greek word stochastikos, meaning “proceeding by guesswork”). A detailed 
study of stochastic processes requires a precise definition of the term probability, which 
is outside the scope of this course. However, the following interpretation will suffice for 
our present purposes: 


Stated informally, the probability that an experiment or observation will have a certain 
outcome is the fraction of the time that the outcome would occur if the experiment could 
be repeated indefinitely under constant conditions — the greater the number of actual 
repetitions, the more accurately the probability describes the fraction of time that the 
outcome occurs. 


For example, when we say that the probability of tossing heads with a fair coin is 
we mean that if the coin were tossed many times under constant conditions, then we 
would expect about half of the outcomes to be heads. Probabilities are often expressed 
as decimals or percentages. Thus, the probability of tossing heads with a fair coin can 
also be expressed as 0.5 or 50%. 

If an experiment or observation has n possible outcomes, then the probabilities of 
those outcomes must be nonnegative fractions whose sum is 1. The probabilities are 
nonnegative because each describes the fraction of occurrences of an outcome over the 
long term, and the sum is 1 because they account for all possible outcomes. For example, 
if a box containing 10 balls has one red ball, three green balls, and six yellow balls, and 
if a ball is drawn at random from the box, then the probabilities of the various outcomes 
are 

pi = prob(red) = 1/10 = 0.1 
p 2 = prob(green) = 3/10 = 0.3 
p 3 = prob(yellow) = 6/10 = 0.6 
Each probability is a nonnegative fraction and 


P\ + Pi + P3 = 0.1 + 0.3 + 0.6 = 1 


In a stochastic process with n possible states, the state vector at each time 1 has the 
form 


x(f) = 


'x\ (f)" 
x 2 (t) 

x n (t). 


Probability that the system is in state 1 
Probability that the system is in state 2 

Probability that the system is in state n 


The entries in this vector must add up to 1 since they account for all n possibilities. In 
general, a vector with nonnegative entries that add up to 1 is called a probability vector. 


► EXAMPLE 3 Example 1 Revisited from the Probability Viewpoint 

Observe that the state vectors in Examples 1 and 2 are all probability vectors. This is to 
be expected since the entries in each state vector are the fractional market shares of the 
channels, and together they account for the entire market. In practice, it is preferable 
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State at time t = k 
\ 

State at time 
— Pij — ~ t = k+ 1 

| 


The entry p „ is the probability 
that the system is in state i at 
time t = k + 1 if it is in state j 
at time t = k. 


▲ Figure 5.5.2 


to interpret the entries in the state vectors as probabilities rather than exact market 
fractions, since market information is usually obtained by statistical sampling procedures 
with intrinsic uncertainties. Thus, for example, the state vector 


x(l) = 


~-T 1 ( 1 ) " 


'0.45' 

*2(1). 


.0-55. 


which we interpreted in Example 1 to mean that channel 1 has 45% of the market and 
channel 2 has 55%, can also be interpreted to mean that an individual picked at random 
from the market will be a channel 1 viewer with probability 0.45 and a channel 2 viewer 
with probability 0.55. 


A square matrix, each of whose columns is a probability vector, is called a stochastic 
matrix. Such matrices commonly occur in formulas that relate successive states of a 
stochastic process. For example, the state vectors x(k + 1) and x(k) in (7) are related by 
an equation of the form x(k + 1) = Px(k ) in which 


0.8 

0.1 

0.2 

0.9 


(10) 


is a stochastic matrix. It should not be surprising that the column vectors of P are prob- 
ability vectors, since the entries in each column provide a breakdown of what happens 
to each channel’s market share over the year — the entries in column 1 convey that each 
year channel 1 retains 80% of its market share and loses 20%; and the entries in column 
2 convey that each year channel 2 retains 90% of its market share and loses 10%. The 
entries in (10) can also be viewed as probabilities: 

pn = 0.8 = probability that a channel 1 viewer remains a channel 1 viewer 
pi\ = 0.2 = probability that a channel 1 viewer becomes a channel 2 viewer 
pu = 0.1= probability that a channel 2 viewer becomes a channel 1 viewer 
p 2 2 = 0.9 = probability that a channel 2 viewer remains a channel 2 viewer 

Example 1 is a special case of a large class of stochastic processes called Markov 
chains. 


DEFINITION 1 A Markov chain is a dynamical system whose state vectors at a succes- 
sion of equally spaced times are probability vectors and for which the state vectors at 
successive times are related by an equation of the form 

x(k + 1) = Px(k) 

in which P = [ptj] is a stochastic matrix and /?,-/ is the probability that the system 
will be in state i at time t = k + 1 if it is in state j at time t — k. The matrix P is 
called the transition matrix for the system. 


WARNING Note that in this definition the row index i corresponds to the later state and the 
column index j to the earlier state (Figure 5.5.2). 
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▲ Figure 5.5.3 


► EXAMPLE 4 Wildlife Migration as a Markov Chain 

Suppose that a tagged lion can migrate over three adjacent game reserves in search 
of food, reserve 1, reserve 2, and reserve 3. Based on data about the food resources, 
researchers conclude that the monthly migration pattern of the lion can be modeled by 
a Markov chain with transition matrix 


Reserve at time t = k 
12 3 



'0.5 

0.4 

0.6' 

p = 

0.2 

0.2 

0.3 


0.3 

0.4 

0.1 


Reserve at time t = k + 1 


(see Figure 5.5.3). That is, 


p n = 0.5 = probability that the lion will stay in reserve 1 when it is in reserve 1 
p u = 0.4 = probability that the lion will move from reserve 2 to reserve 1 
p u = 0.6 = probability that the lion will move from reserve 3 to reserve 1 
P21 = 0.2 = probability that the lion will move from reserve 1 to reserve 2 
P22 = 0.2 = probability that the lion will stay in reserve 2 when it is in reserve 2 
P 23 = 0.3 = probability that the lion will move from reserve 3 to reserve 2 
/? 3 i = 0.3 = probability that the lion will move from reserve 1 to reserve 3 
p 32 = 0.4 = probability that the lion will move from reserve 2 to reserve 3 
p 33 = 0.1= probability that the lion will stay in reserve 3 when it is in reserve 3 

Assuming that t is in months and the lion is released in reserve 2 at time t = 0, track its 

probable locations over a six-month period. 


Solution Let x\ ( k ), x 2 (k), and x 2 {k) be the probabilities that the lion is in reserve 1, 2, 
or 3, respectively, at time t = k, and let 


x(k) = 


xi (k)' 
x 2 (k) 

_* 3 (k)_ 


be the state vector at that time. Since we know with certainty that the lion is in reserve 
2 at time t = 0, the initial state vector is 


x(0) = 


" 0 " 

1 

0 



Andrei Andreyevich 
Markov 


(1856-1922) 


Markov chains are named in honor 
of the Russian mathematician A. A. Markov, a lover 
of poetry, who used them to analyze the alterna- 
tion of vowels and consonants in the poem Eugene 
Onegin by Pushkin. Markov believed that the only 
applications of his chains were to the analysis of lit- 
erary works, so he would be astonished to learn that 
his discovery is used today in the social sciences, 
quantum theory, and genetics! 

[Image: SPL/Science Source] 
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Markov Chains in Terms of 
Powers of the Transition 
Matrix 


Xl 

Note that Formula (12) makes 
it possible to compute the state 
vector Xj. without first com- 
puting the earlier state vectors 
as required in Formula (11). 


Long-Term Behavior of a 
Markov Chain 


We leave it for you to show that the state vectors over a six-month period are 



'0.400' 


"0.520" 


’0.500" 

x(l) = Px(0) = 

0.200 

, x(2) = Px( 1) = 

0.240 

, x(3) = Px(2) = 

0.224 


_0.400_ 


_0.240_ 


_0.276_ 


'0.505' 


'0.504' 


'0.504' 

x(4) = Px( 3) « 

0.228 

, x(5) = Px(4) « 

0.227 

, x(6) = Px(5) « 

0.227 


0.267 


0.269 


0.269 


As in Example 2, the state vectors here seem to stabilize over time with a probability of 
approximately 0.504 that the lion is in reserve 1, a probability of approximately 0.227 
that it is in reserve 2, and a probability of approximately 0.269 that it is in reserve 3. 

In a Markov chain with an initial state of x(0), the successive state vectors are 

x(l) = Px(0), x(2) = Px(l), x(3) = Px(2), x(4) = Px(3), . . . 

For brevity, it is common to denote x(k) by x*, which allows us to write the successive 
state vectors more briefly as 

xi = Px 0 , x 2 = Pxi, x 3 = Px 2 , x 4 = Px 3 , . . . (11) 

Alternatively, these state vectors can be expressed in terms of the initial state vector x 0 
as 

Px o, x 2 = P(Px 0 ) = P 2 x 0 , x 3 = P(P 2 x 0 ) = P 3 x 0 , x 4 = P(P 3 x 0 ) = P 4 x 0 , . . . 
from which it follows that 

x k = P*x 0 (12) 


► EXAMPLES Finding a State Vector Directly from x 0 

Use Formula (12) to find the state vector x(3) in Example 2. 


Solution From (1) and (7), the initial state vector and transition matrix are 


o.8 o.r 

0.2 0.9_ 

We leave it for you to calculate P 3 and show that 


x 0 = x(0) = 


0.5 

0.5 


and P = 


x(3) = x 3 = P 3 x 0 


'0.562 

0.438 


0.219 

0.781 



'0.5' 


'0.3905' 


_0.5_ 


_0.6095_ 


which agrees with the result in (8). 


We have seen two examples of Markov chains in which the state vectors seem to stabilize 
after a period of time. Thus, it is reasonable to ask whether all Markov chains have this 
property. The following example shows that this is not the case. 

► EXAMPLE 6 A Markov Chain That Does Not Stabilize 

The matrix 

'0 1 ' 

1 0 


P = 
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is stochastic and hence can be regarded as the transition matrix for a Markov chain. A 
simple calculation shows that P 2 = I, from which it follows that 

/ = P 2 = P 4 = P 6 = ••• and P = P 3 = P 5 = P 1 = ••• 

Thus, the successive states in the Markov chain with initial vector x 0 are 

xo, Px 0 , Xo, Pxo, xo, . . . 

which oscillate between xo and Px o. Thus, the Markov chain does not stabilize unless 
both components of x 0 are \ (verify). 


A precise definition of what it means for a sequence of numbers or vectors to stabilize 
is given in calculus; however, that level of precision will not be needed here. Stated 
informally, we will say that a sequence of vectors 

Xi, X 2 Xfc, ... 

approaches a limit q or that it converges to q if all entries in x& can be made as close as we 
like to the corresponding entries in the vector q by taking k sufficiently large. We denote 
this by writing x* -> q as k -> oo. Similarly, we say that a sequence of matrices 

P U P 2 ,P3,...,Pk,... 

converges to a matrix Q. written P* — »■ Q as k -* °°, if each entry of Pi, can be made as 
close as we like to the corresponding entry of Q by taking k sufficiently large. 

We saw in Example 6 that the state vectors of a Markov chain need not approach a 
limit in all cases. However, by imposing a mild condition on the transition matrix of a 
Markov chain, we can guarantee that the state vectors will approach a limit. 


DEFINITION 2 A stochastic matrix P is said to be regular if P or some positive power 
of P has all positive entries, and a Markov chain whose transition matrix is regular 
is said to be a regular Markov chain. 


► EXAMPLE 7 Regular Stochastic Matrices 


The transition matrices in Examples 2 and 4 are regular because their entries are positive. 
The matrix 


P = 


0.5 

0.5 


1 

0 


is regular because 


P 2 


0.75 0.5' 
0.25 0.5 


has positive entries. The matrix P in Example 6 is not regular because P and every 
positive power of P have some zero entries (verify). 


The following theorem, which we state without proof, is the fundamental result about 
the long-term behavior of Markov chains. 
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[EM 5.5.1 IfP is the transition matrix for a regular Markov chain , then : 

(a) There is a unique probability vector q with positive entries such that P q = q. 

(b) For any initial probability vector xo, the sequence of state vectors 

x 0 , Px o, . . . , P k x o, . . . 

converges to q. 

(c) The sequence P, P 2 , P 3 , . . . , P k , . . . converges to the matrix Q each of whose 
column vectors is q. 


The vector q in Theorem 5.5.1 is called the steady-state vector of the Markov chain. 
Because it is a nonzero vector that satisfies the equation P q = q, it is an eigenvector 
corresponding to the eigenvalue X = 1 of P. Thus, q can be found by solving the linear 
system 

(7 - P) q = 0 (13) 

subject to the requirement that q be a probability vector. Here are some examples. 


► EXAMPLE 8 Examples 1 and 2 Revisited 

The transition matrix for the Markov chain in Example 2 is 

ro .8 0.11 

P = 

0.2 0.9 


Since the entries of P are positive, the Markov chain is regular and 
steady-state vector q. To find q we will solve the system (/ — P) q 
write as 


' 0.2 
- 0.2 



V 


'O' 


m. 


_o_ 


hence has a unique 
= 0 , which we can 


The general solution of this system is 


q\ = 0.5j, qi = s 


(verify), which we can write in vector form as 


V 


"0.5s" 


\ s 

. 72 . 


s 


s 


For q to be a probability vector, we must have 

\ = qi + q 2 = 


(14) 


which implies that s = |. Substituting this value in (14) yields the steady-state vector 



which is consistent with the numerical results obtained in (9). 


► EXAMPLE 9 Example 4 Revisited 

The transition matrix for the Markov chain in Example 4 is 


0.5 

0.4 

0.6 

0.2 

0.2 

0.3 

0.3 

0.4 

0.1 


P = 
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Since the entries of P are positive, the Markov chain is regular and hence has a unique 
steady-state vector q. To find q we will solve the system (/ — P) q = 0, which we can 
write (using fractions) as 


r 1 2 3 -1 




2 5 5 


q 1 


_ 0" 

1 4 3 

5 5 10 


<72 

= 

0 

3 2 9 


-?3_ 


_ 0 _ 

L 10 5 10 J 





(15) 


(We have converted to fractions to avoid roundoff error in this illustrative example.) We 
leave it for you to confirm that the reduced row echelon form of the coefficient matrix is 


-1 o -f- 

0 1 -f 

_0 0 0 _ 


and that the general solution of (15) is 

q\ = fr, qi = | \s, qi = s 


(16) 


For q to be a probability vector we must have q\ + qi + qi = \, from which it follows 
that s = jjg (verify). Substituting this value in (16) yields the steady-state vector 


" 60 “ 


119 


"0.5042" 

27 

119 

RS 

0.2269 

32 


0.2689 

_ 119 _ 



(verify), which is consistent with the results obtained in Example 4. 


Exercise Set 5.5 

In Exercises 1-2, determine whether A is a stochastic matrix. 
If A is not stochastic, then explain why not. 


4. P = 


1. (a) A = 


(c) A = 


2. (a) A = 


(c) A = 


0.4 

0.6 

T 

0 

0 

0.2 

0.8 

12 

1 

2 
5_ 

L 12 


0.3' 

0.7 


(b) A = 


0.4 

0.3 


0 . 6 ' 

0.7 


\_ 

2 

0.9' 

0.1 

l 

9 


(d) A = 


0.9 

'-1 

0 

2 


0.1 

l 

3 

i 

3 

X 

3 


0.8 

0.2 


0.5' 

0.5 


Xo = 


In Exercises 5-6, determine whether P is a regular stochastic 
matrix. 




“ 1 1 

3 3 

1 “ 

2 

5. (a) P = 

" 1 1 “ 

5 7 

(b) p = 

A o' 

(c) P = 


(d) A = 

1 1 

6 3 

1 

2 


4 6 

. 5 7 _ 


J 1 . 




1 1 

_ 2 3 

1 


n n 


_1 11 

" 





6. (a) P = 


(b) P = 


(C) P = 


(h3 A = 

°o 

O 

<N 

O 



A 0 . 


.0 L 



In Exercises 10, verify that P is a regular stochastic matrix, 
and find the steady-state vector for the associated Markov chain. 


In Exercises -4, use Formulas (1 1) and ( 12) to compute the 
state vector X 4 in two different ways. 


3. P = 


0.5 

0.5 


0 . 6 ' 

0.4 


x 0 = 


0.5' 

0.5 


7. P = 


9. P = 


X 

4 

3 

L 4 

X 

2 

i_ 

4 

x 

.4 


0 I 


8 . P = 


10. P = 


0.2 0.6 
0.8 0.4 


0 1 2 

v A C 


— 0 — 
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11. Consider a Markov process with transition matrix 

State 1 State 2 
State 1 T 0.2 0.1" 

State 2 [ 0.8 0.9 _ 

(a) What does the entry 0.2 represent? 

(b) What does the entry 0.1 represent? 

(c) If the system is in state 1 initially, what is the probability 
that it will be in state 2 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, 
what is the probability that it will be in state 2 at the next 
observation? 

12. Consider a Markov process with transition matrix 

State 1 State 2 

State 1 0 j 

State 2 i 6 

7 J 

(a) What does the entry | represent? 

(b) What does the entry 0 represent? 

(c) If the system is in state 1 initially, what is the probability 
that it will be in state 1 at the next observation? 

(d) If the system has a 50% chance of being in state 1 initially, 
what is the probability that it will be in state 2 at the next 
observation? 

13. On a given day the air quality in a certain city is either good or 
bad. Records show that when the air quality is good on one 
day, then there is a 95% chance that it will be good the next 
day, and when the air quality is bad on one day, then there is 
a 45% chance that it will be bad the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the air quality is good today, what is the probability that 
it will be good two days from now? 

(c) If the air quality is bad today, what is the probability that 
it will be bad three days from now? 

(d) If there is a 20% chance that the air quality will be 
good today, what is the probability that it will be good 
tomorrow? 

14. In a laboratory experiment, a mouse can choose one of two 
food types each day, type I or type II. Records show that if 
the mouse chooses type I on a given day, then there is a 75% 
chance that it will choose type 1 the next day, and if it chooses 
type II on one day, then there is a 50% chance that it will 
choose type II the next day. 

(a) Find a transition matrix for this phenomenon. 

(b) If the mouse chooses type 1 today, what is the probability 
that it will choose type I two days from now? 


(c) If the mouse chooses type II today, what is the probability 
that it will choose type II three days from now? 

(d) If there is a 10% chance that the mouse will choose type 
I today, what is the probability that it will choose type I 
tomorrow? 

15. Suppose that at some initial point in time 100,000 people live 
in a certain city and 25,000 people live in its suburbs. The 
Regional Planning Commission determines that each year 5% 
of the city population moves to the suburbs and 3% of the 
suburban population moves to the city. 

(a) Assuming that the total population remains constant, 
make a table that shows the populations of the city and 
its suburbs over a five-year period (round to the nearest 
integer). 

(b) Over the long term, how will the population be distributed 
between the city and its suburbs? 

16. Suppose that two competing television stations, station 1 and 
station 2, each have 50% of the viewer market at some initial 
point in time. Assume that over each one-year period station 1 
captures 5% of station 2's market share and station 2 captures 
10% of station l’s market share. 

(a) Make a table that shows the market share of each station 
over a five-year period. 

(b) Over the long term, how will the market share be dis- 
tributed between the two stations? 

17. Fill in the missing entries of the stochastic matrix 



and find its steady-state vector. 

18. If P is ann xn stochastic matrix, and if M is a 1 x n matrix 

whose entries are all 1 ’s, then MP = . 

19. If P is a regular stochastic matrix with steady-state vector q, 
what can you say about the sequence of products 

P q, P 2 q, P 3 q P k q, . . . 

as k -*■ oo? 

20. (a) If P is a regular n x n stochastic matrix with steady-state 

vector q, and if ei, e 2 , . . . , e„ are the standard unit vectors 
in column form, what can you say about the behavior of 
the sequence 

Pe,, P 2 e,, P 3 e, P k e,, ... 

as k— >-co for each i — 1,2,...,;;? 

(b) What does this tell you about the behavior of the column 
vectors of P k as fc— >oo? 
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Working with Proofs 

21. Prove that the product of two stochastic matrices with the 
same size is a stochastic matrix. [Hint: Write each column of 
the product as a linear combination of the columns of the first 
factor.] 

22. Prove that if P is a stochastic matrix whose entries are all 
greater than or equal to p , then the entries of P 2 are greater 
than or equal to p. 


Rented from Location 
1 2 3 

1 

Returned to 

2 

Location 

3 


1 

1 

3 

10 

5 

5 

4 

3 

1 

5 

10 

5 

1 

10 

1 

2 

1 

5 


True-False Exercises 


TF. In parts (a)-(g) determine whether the statement is true or 
false, and justify your answer. 


(a) The vector 


(b) The matrix 


3 

0 

2 

. 3 J 


is a probability vector. 


0.2 1 
0.8 0 


is a regular stochastic matrix. 


(c) The column vectors of a transition matrix are probability 
vectors. 

(d) A steady-state vector for a Markov chain with transition ma- 
trix P is any solution of the linear system (/ — P)q = 0. 


(a) Assuming that a car is rented from location 1, what is the 
probability that it will be at location 1 after two rentals? 

(b) Assuming that this dynamical system can be modeled as a 
Markov chain, find the steady-state vector. 

(c) If the rental agency owns 120 cars, how many parking spaces 
should it allocate at each location to be reasonably certain 
that it will have enough spaces for the cars over the long term? 
Explain your reasoning. 

T3. Physical traits are determined by the genes that an offspring 
receives from its parents. In the simplest case a trait in the off- 
spring is determined by one pair of genes, one member of the pair 
inherited from the male parent and the other from the female par- 
ent. Typically, each gene in a pair can assume one of two forms, 
called alleles, denoted by A and a. This leads to three possible 
pairings: 


(e) The square of every regular stochastic matrix is stochastic. 


AA, Aa, aa 


(f ) A vector with real entries that sum to 1 is a probability vector. 

(g) Every regular stochastic matrix has X = 1 as an eigenvalue. 

Working with Technology 

Tl. In Examples 4 and 9 we considered the Markov chain with 
transition matrix P and initial state vector x(0) where 



'0.5 

0.4 

0.6" 


"O' 

p = 

0.2 

0.2 

0.3 

and x(0) = 

1 


0.3 

0.4 

0.1 


0 


(a) Confirm the numerical values of x(l), x(2), ..., x(6) obtained 
in Example 4 using the method given in that example. 

(b) As guaranteed by part (c) of Theorem 5.5.1, confirm that the 
sequence P, P 2 , P 3 , . . . , P k , . . . converges to the matrix Q 
each of whose column vectors is the steady-state vector q ob- 
tained in Example 9. 

T2. Suppose that a car rental agency has three locations, num- 
bered 1, 2, and 3. A customer may rent a car from any of the three 
locations and return it to any of the three locations. Records show 
that cars are rented and returned in accordance with the following 
probabilities: 


called genotypes (the pairs Aa and a A determine the same trait 
and hence are not distinguished from one another). It is shown in 
the study of heredity that if a parent of known genotype is crossed 
with a random parent of unknown genotype, then the offspring 
will have the genotype probabilities given in the following table, 
which can be viewed as a transition matrix for a Markov process: 


Genotype of 
Offspring 


Genotype of Parent 
AA Aa aa 

AA 
Aa 

aa 


1 

1 

0 

2 

4 

1 

2 

1 

2 

1 

2 

0 

1 

1 

4 

2 


Thus, for example, the offspring of a parent of genotype AA that 
is crossed at random with a parent of unknown genotype will have 
a 50% chance of being AA, a 50% chance of being Aa, and no 
chance of being aa. 

(a) Show that the transition matrix is regular. 

(b) Find the steady-state vector, and discuss its physical interpre- 
tation. 
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1. (a) Show that if 0 < 9 < jt , then 


A = 


cos# 

sin# 


— sin# 
cost? 


has no real eigenvalues and consequently no real eigen- 
vectors. 

(b) Give a geometric explanation of the result in part (a). 

2. Find the eigenvalues of 


A = 


'0 

0 

k 3 


1 O' 

0 1 

—3 k 2 3k 


3. (a) Show that if D is a diagonal matrix with nonnegative en- 
tries on the main diagonal, then there is a matrix S such 
that S 2 = D. 


(b) Show that if A is a diagonalizable matrix with nonnegative 
eigenvalues, then there is a matrix S such that S 2 = A. 

(c) Find a matrix S such that S 2 = A, given that 


A = 


1 

0 

0 


3 1' 

4 5 

0 9 


4. Given that A and B are similar matrices, in each part deter- 
mine whether the given matrices are also similar. 

(a) A T and B T 

(b) A k and B k ( k a positive integer) 

(c) A -1 and B 1 (if A is invertible) 


5. Prove: If A is a square matrix and p{X) — det (XI — A) is the 
characteristic polynomial of A, then the coefficient of A" -1 in 
p(X) is the negative of the trace of A. 


Verify this result for 

0 1 O' 

0 0 1 

1 -3 3_ 

In Exercises 8-10, use the Cayley-Hamilton Theorem, stated 
in Exercise 7. 

8. (a) Use Exercise 28 of Section 5.1 to establish the Cayley- 

Hamilton Theorem for 2 x 2 matrices. 

(b) Prove the Cayley-Hamilton Theorem for n x n diagonal- 
izable matrices. 

9. The Cayley-Hamilton Theorem provides a method for calcu- 
lating powers of a matrix. For example, if A is a 2 x 2 matrix 
with characteristic equation 

Co T C\X T X~ ~ 0 

then c a I + C]A + A 2 = 0, so 

A 2 = —ciA — c 0 I 

Multiplying through by A yields A 3 = — CiA 2 — c 0 A, which 
expresses A 3 in terms of A 2 and A, and multiplying through by 
A 2 yields A 4 = — Ci A 3 — c 0 A 2 , which expresses A 4 in terms of 
A 3 and A 2 . Continuing in this way, we can calculate successive 
powers of A by expressing them in terms of lower powers. Use 
this procedure to calculate A 2 , A 3 , A 4 , and A 5 for 


"3 6 

1 2 

(b) A = 

L - 1 



10. Use the method of the preceding exercise to calculate A 3 and 
A 4 for 


A = 


0 

0 

1 


1 

0 

-3 


0 

1 

3 


11. Find the eigenvalues of the matrix 


6. Prove: If b ^ 0, then 


A = 


a 

0 


b 


a 



Cl 

c 2 ■ ■ 

Cn 

Cl 

c 2 ■ ■ 

Cn 

Cl 

c 2 ■ ■ 

C n , 


is not diagonalizable. 

7. In advanced linear algebra, one proves the Cayley-Hamilton 
Theorem , which states that a square matrix A satisfies its char- 
acteristic equation; that is, if 

Co + Ci A. T c 2 A“ + ■ ■ ■ + c„_iA" 1 4- X tl = 0 
is the characteristic equation of A, then 


12. (a) It was shown in Exercise 37 of Section 5.1 that if A is an 
n x n matrix, then the coefficient of X" in the characteris- 
tic polynomial of A is 1 . (A polynomial with this property 
is called monic.) Show that the matrix 

"0 0 0 • • • 0 -c 0 

1 0 0 ■ • • 0 -d 

0 1 0 ■ • • 0 -ci 


Co I + Ci A 4- C 2 A“ + ■■■-(- c„_i A" 1 4- A" — 0 


0 0 0 • • • 1 -c„_i 
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has characteristic polynomial 

p{X) = c‘o + CjA + ■ ■ ■ + c n _\X n 1 4- X n 

This shows that every monic polynomial is the characteris- 
tic polynomial of some matrix. The matrix in this example 
is called the companion matrix of p(X). [Hint: Evaluate 
all determinants in the problem by adding a multiple of 
the second row to the first to introduce a zero at the top of 
the first column, and then expanding by cofactors along 
the first column.] 

(b) Find a matrix with characteristic polynomial 
p(X) = 1 - 2X + X 2 + 3L 3 + X 4 

13. A square matrix A is called nilpotent if A" = 0 for some pos- 
itive integer n. What can you say about the eigenvalues of a 
nilpotent matrix? 

14. Prove: If A is an n x n matrix and n is odd, then A has at least 
one real eigenvalue. 

15. Find a 3 x 3 matrix A that has eigenvalues X = 0, 1, and — 1 
with corresponding eigenvectors 


o' 


f 


'o' 

1 

, 

-1 

, 

1 

-1 


1 


1 


respectively. 


16. Suppose that a 4 x 4 matrix A has eigenvalues A.i = 1, 

X 2 — —2, X 3 — 3, and X 4 = —3. 

(a) Use the method of Exercise 24 of Section 5.1 to find 
det(A). 

(b) Use Exercise 5 above to find tr(A). 

17. Let A be a square matrix such that A 3 = A. What can you 
say about the eigenvalues of A? 

18. (a) Solve the system 

y[= yi + 3y 2 

y'j = -yi + 4y 2 

(b) Find the solution satisfying the initial conditions 
>'i (0) = 5 and y2(0) = 6. 

19. Let A be a 3 x 3 matrix, one of whose eigenvalues is 1 . Given 
that both the sum and the product of all three eigenvalues is 6, 
what are the possible values for the remaining two eigenvalues? 

20. Show that the matrices 


'0 

1 

o' 


'rfi 

0 

0 

0 

0 

1 

and D — 

0 

d 2 

0 

_1 

0 

0 _ 


_ 0 

0 

d 3 _ 


are similar if 

27tk Ink 

d k — cos — 1- i sin — (k = 1, 2, 3) 
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INTRODUCTION In Chapter 3 we defined the dot product of vectors in R'\ and we used that concept to 
define notions of length, angle, distance, and orthogonality. In this chapter we will 
generalize those ideas so they are applicable in any vector space, not just R" . We will 
also discuss various applications of these ideas. 


6.1 Inner Products 

In this section we will use the most important properties of the dot product on R n as 
axioms, which, if satisfied by the vectors in a vector space V, will enable us to extend the 
notions of length, distance, angle, and perpendicularity to general vector spaces. 


General Inner Products 


Note that Definition 1 applies 
only to real vector spaces. A 
definition of inner products on 
complex vector spaces is given 
in the exercises. Since we will 
have little need for complex 
vector spaces from this point 
on, you can assume that all 
vector spaces under discussion 
are real, even though some of 
the theorems are also valid in 
complex vector spaces. 


In Definition 4 of Section 3.2 we defined the dot product of two vectors in R " , and in 
Theorem 3.2.2 we listed four fundamental properties of such products. Our first goal 
in this section is to extend the notion of a dot product to general real vector spaces by 
using those four properties as axioms. We make the following definition. 


DEFINITION 1 An inner product on a real vector space V is a function that associates 
a real number (u, v> with each pair of vectors in V in such a way that the following 
axioms are satisfied for all vectors u, v, and w in V and all scalars k. 

1. (u, v) = (v, u) [Symmetry axiom | 

(u + V, w) = (u, w) + (V, w) | Additivity axiom ] 

3. (ku, V) = k(u, v) I Homogeneity axiom | 

4 . (v, V) > 0 and (V, v) = 0 if and only if v = 0 [Positivity axiom I 

A real vector space with an inner product is called a real inner product space. 


Because the axioms for a real inner product space are based on properties of the dot 
product, these inner product space axioms will be satisfied automatically if we define the 
inner product of two vectors u and v in R" to be 


(u, v) = U • V = U\V\ + «2l>2 + • ■ • + U n V n 


( 1 ) 
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This inner product is commonly called the Euclidean inner product (or the standard inner 
product ) on R" to distinguish it from other possible inner products that might be defined 
on R" . We call R” with the Euclidean inner product Euclidean n-space. 

Inner products can be used to define notions of norm and distance in a general inner 
product space just as we did with dot products in R n . Recall from Formulas (11) and (19) 
of Section 3.2 that if u and v are vectors in Euclidean n -space, then norm and distance 
can be expressed in terms of the dot product as 

IMI = vV • v and d{n, v) = ||u — v|| = -y/(u — v) • (u — v) 

Motivated by these formulas, we make the following definition. 


DEFINITION 2 If V is a real inner product space, then the norm (or length) of a vector 
v in V is denoted by ||v|| and is defined by 

IMI = V (v, v) 

and the distance between two vectors is denoted by d (u, v) and is defined by 
d( u, v) = ||u — v|| = -J (u — v, u — v) 

A vector of norm 1 is called a unit vector. 


The following theorem, whose proof is left for the exercises, shows that norms and 
distances in real inner product spaces have many of the properties that you might expect. 


1 If u and v are vectors in a real inner product space V, and if k is a 

scalar, then'. 

i a ) IMI > 0 with equality if and only if v = 0. 

ib) ||*v|| = |*|||v||. 

(c) d(n, v) = d(\, u). 

{d) d (u, v) > 0 with equality if and only if u = v. 


Although the Euclidean inner product is the most important inner product on R " , 
there are various applications in which it is desirable to modify it by weighting each term 
differently. More precisely, if 

wfi, w 2 ,...,w n 

are positive real numbers, which we will call weights, and if u = (m, u 2 , ■ . ■ , u n ) and 
v = {v\,v 2 , . .., v„) are vectors in R n , then it can be shown that the formula 

(u, v) = W\U\V\ + w 2 u 2 v 2 H b w„u„v n (2) 

defines an inner product on R" that we call the weighted Euclidean inner product with 
weights W], w 2 w„. 


Note that the standard Eu- 
clidean inner product in For- 
mula (1) is the special case 
of the weighted Euclidean in- 
ner product in which all the 
weights are 1 . 


► EXAMPLE 1 Weighted Euclidean Inner Product 

Let u = iu\,u 2 ) and v = (i>i, v 2 ) be vectors in R 2 . Verify that the weighted Euclidean 
inner product 

(u, v) = 3uiV\ + 2u 2 v 2 (3) 

satisfies the four inner product axioms. 
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In Example 1, we are using 
subscripted ui’s to denote the 
components of the vector w, 
not the weights. The weights 
are the numbers 3 and 2 in For- 
mula (3). 


Solution 

Axiom 1: Interchanging u and v in Formula (3) does not change the sum on the right 
side, so (u, v) = (v, u>. 

Axiom 2: If w = (w i, w 2 ), then 

(u + v, w> = 3(iu + nOwi + 2 (u 2 + v 2 )w 2 

= 3(miWi + i>iWi) + 2( u 2 w 2 + v 2 w 2 ) 

= (3«iuti + 2 u 2 w 2 ) + (3i>iuti + 2 v 2 w 2 ) 

= (u, w) + (v, w) 

Axiom 3: (ku, v) = 3(kui)v\ + 2(ku 2 )v 2 
— k(3ii\V\ + 2u 2 v 2 ) 

= k(u, y) 

Axiom 4: (v, v) = 3(i>ii>i) + 2(v 2 v 2 ) = 3i> 2 + 2v\ > 0 with equality if and only if 
tq = v 2 — 0, that is, if and only if v = 0. 


An Application of Weighted 
Euclidean Inner Products 


To illustrate one way in which a weighted Euclidean inner product can arise, suppose 
that some physical experiment has n possible numerical outcomes 


Xi,X 2 ,...,X„ 


and that a series of m repetitions of the experiment yields these values with various 
frequencies. Specifically, suppose that xi occurs f\ times, x 2 occurs f 2 times, and so 
forth. Since there is a total of m repetitions of the experiment, it follows that 


fi + f 2 + b fn = m 

Thus, the arithmetic average of the observed numerical values (denoted by i) is 

/i*i + fix 2 H b f n x n 1 , , . , . . , , 

= 7 ' 7 : — - c = —\f\X\ + Jix 2 H b f„x n ) 

f\ + fi H b fn m 

If we let 

f=(/ 1 ,/ 2 ,...,/„) 
x= (xi,x 2 ,...,x n ) 
u>i = w 2 = ■ ■ ■ = w n = \/m 

then (4) can be expressed as the weighted Euclidean inner product 
x = (f, x> = wifixi + w 2 f 2 x 2 H b w n f n x n 


(4) 


> EXAMPLE 2 Calculating with a Weighted Euclidean Inner Product 

It is important to keep in mind that norm and distance depend on the inner product being 
used. If the inner product is changed, then the norms and distances between vectors also 
change. For example, for the vectors u = (1,0) and v = (0, 1) in R 2 with the Euclidean 
inner product we have 

|| u|| = x/ 1 2 -b 0 2 = 1 

and 

d( u, v) = || u - v|| = ||(1, -1)|| = v / l 2 + (-l) 2 = V2 
but if we change to the weighted Euclidean inner product 

(u, v) = 3 mii>i + 2u 2 v 2 

we have 

Hull = (u, u ) 1/2 = (3(1)(1) + 2(0)(0)] 1/2 = ^3 
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Unit Circles and Spheres in 
Inner Product Spaces 



(a) The unit circle using 
the standard Euclidean 
inner product. 



(b) The unit circle using 
a weighted Euclidean 
inner product. 


▲ Figure 6.1.1 


Inner Products Generated 
by Matrices 


and 

d (u, v) = || u - v 1 1 = ((1, -1), (1, -1)) I/2 

= [3(1)(1) + 2( — 1)( — 1)] 1/2 = V5 ◄ 


DEFINITION 3 If V is an inner product space, then the set of points in V that satisfy 

Hull = 1 

is called the unit sphere or sometimes the unit circle in V . 


► EXAMPLE 3 Unusual Unit Circles in R 2 

(a) Sketch the unit circle in an xy-coordinate system in R 2 using the Euclidean inner 
product (u, v) = u\V\ + u 2 v 2 . 

(b) Sketch the unit circle in an xy-coordinate system in R 2 using the weighted Euclidean 
inner product (u, v) = \u ii>i + \u 2 v 2 . 

Solution (a) If u = (x, y), then ||u|| = (u, u} 1/2 = sj x 2 + y 2 , so the equation of the unit 
circle is yx 2 + y 2 = 1, or on squaring both sides, 

x 2 + y 2 = 1 

As expected, the graph of this equation is a circle of radius 1 centered at the origin 
(Figure 6.1.1a). 

Solution (f>) If u = (x, y), then ||u|| = (u, u) 1/2 = J \x 2 + \ y 2 , so the equation of the 
unit circle is J\x 2 + \y 2 = 1, or on squaring both sides, 


The graph of this equation is the ellipse shown in Figure 6.1.1/?. Though this may seem 
odd when viewed geometrically, it makes sense algebraically since all points on the ellipse 
are 1 unit away from the origin relative to the given weighted Euclidean inner product. In 
short, weighting has the effect of distorting the space that we are used to seeing through 
“unweighted Euclidean eyes.” M 


The Euclidean inner product and the weighted Euclidean inner products are special cases 
of a general class of inner products on R n called matrix inner products. To define this 
class of inner products, let u and v be vectors in R n that are expressed in column form, 
and let A be an invertible n x n matrix. It can be shown (Exercise 47) that if u • v is the 
Euclidean inner product on R 1 ' , then the formula 


(u, v) = Au • Av (5) 

also defines an inner product; it is called the inner product on R" generated hy A. 

Recall from Table 1 of Section 3.2 that if u and v are in column form, then u • v can 
be written as \ T u from which it follows that (5) can be expressed as 

(u, \) = (Av) r Au 
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Every diagonal matrix with 
positive diagonal entries gen- 
erates a weighted inner prod- 
uct. Why? 


Other Examples of Inner 
Products 


or equivalently as 


(u, v) = v'A'Au 


( 6 ) 


► EXAMPLE 4 Matrices Generating Weighted Euclidean Inner Products 

The standard Euclidean and weighted Euclidean inner products are special cases of 
matrix inner products. The standard Euclidean inner product on R" is generated by the 
n x n identity matrix, since setting A = I in Formula (5) yields 

(u, v) = /u • /v = U • V 


and the weighted Euclidean inner product 

(u, v) = W\U\V\ + w 2 u 2 v 2 H h w n u n v n (7) 


is generated by the matrix 


0 0 ■ • • 0 

0 s/up 0 ■ • • 0 

A — ... 

0 0 0 • • • *Juhi 

This can be seen by observing that A T A is the n x n diagonal matrix whose diagonal 
entries are the weights w\,w 2 , ■ ■ ■ ,w n . 


► EXAMPLES Example 1 Revisited 

The weighted Euclidean inner product (u, v) = 3 mii>i + 2u 2 v 2 discussed in Example 1 
is the inner product on R 2 generated by 


V3 0 
0 V2 


So far, we have only considered examples of inner products on R n . We will now consider 
examples of inner products on some of the other kinds of vector spaces that we discussed 
earlier. 

► EXAMPLE 6 The Standard Inner Product on M nn 

If u = U and v = V are matrices in the vector space M„„, then the formula 

(u, v) = tr (U T V) ( 8 ) 

defines an inner product on M nn called the standard inner product on that space (see 
Definition 8 of Section 1.3 for a definition of trace). This can be proved by confirming 
that the four inner product space axioms are satisfied, but we can see why this is so by 
computing ( 8 ) for the 2 x 2 matrices 


u = 

U\ U2 

and V = 

'd 

V 2 ~ 


U 3 U4 


v 3 

V4_ 


This yields 

(u, v) = tr (U^V) = U\V\ T u 2 v 2 T- u 3 v 3 -f- W 4 U 4 
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which is just the dot product of the corresponding entries in the two matrices. And it 
follows from this that 


||u|| = yj (u, u> = y/tr(U T U) = yju\ + u\ + u\ + u\ 


For example, if 

then 

and 


u= U = 


1 2 ' 
3 4 


and v = V = 


-1 O' 

3 2 


(u, v> = tr(U T V) = 1 ( — 1) + 2(0) + 3(3) + 4(2) = 16 


||u|| = V^Tu) = y/ tr(U T U) = Vl 2 + 2 2 + 3 2 + 4 2 = V30 
||v|| = V(^) = >/tr(V r V) = yj (— l) 2 + 0 2 + 3 2 + 2 2 = 714 


► EXAMPLE 7 The Standard Inner Product on P n 

If 

p = flo + a \x + ■ ■ ■ + a n x n and q = bo + b\X + ■ ■ ■ + b n x n 

are polynomials in P„ , then the following formula defines an inner product on P n (verify) 
that we will call the standard inner product on this space: 

(p, q) = a 0 b 0 + a\b\ H h a„b n (9) 

The norm of a polynomial p relative to this inner product is 

IIpII = V (p, p> = 


► EXAMPLE 8 The Evaluation Inner Product on P n 

If 


p = p(x) = do + ti\X + • • • + a„x n and q = q (x) = bo + b\X + ■ ■ ■ + b n x n 

are polynomials in P n , and if Xo, x \, . . . , x n are distinct real numbers (called sample 
points ), then the formula 

(P, 9) = p(xo)q(x 0 ) + p(x\)q{x\) H h p(x n )q(x n ) (10) 

defines an inner product on P„ called the evaluation inner product at Xo, x \, . . . , x„. 
Algebraically, this can be viewed as the dot product in R" of the n -tuples 

(p(x 0 ),p(x i) p(x„)) and (q(x 0 ),q(x i), q(x„)) 

and hence the first three inner product axioms follow from properties of the dot product. 
The fourth inner product axiom follows from the fact that 

(P, P) = [pix 0 )] 2 + [p(.*i)] 2 H b l Pix n )] 2 > 0 

with equality holding if and only if 

p(x o) = p(x i) = • • • = p(x n ) = 0 

But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must 
be that p = 0, which proves that the fourth inner product axiom holds. 
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The norm of a polynomial p relative to the evaluation inner product is 

llpll = V (p> p} = y/[p(x o)] 2 + [p (xi)] 2 H h [p(x„)] 2 (11) 

► EXAMPLE 9 Working with the Evaluation Inner Product 

Let P 2 have the evaluation inner product at the points 

Xo = —2, xi = 0, and x 2 = 2 

Compute (p, q) and ||p|| for the polynomials p = p(x) = x 2 and q = q(x) = 1 + x. 
Solution It follows from (10) and (11) that 

<p, q) = p(—2)q(—2) + p(0)q(0) + p(2)q(2) = (4)(— 1) + (0)(1) + (4)(3) = 8 
llpll = y/[p(x o)] 2 + [piXi)] 2 + [p(x 2 )] 2 = j[p(-2)] 2 + [p(0)p + [p(2)Y 

= V4 2 + 0 2 + 4 2 = V32 = \sf2 


CALCULUS REQUIRED 


► EXAMPLE 10 An Integral Inner Product on C[a, b] 

Let f = /(x) and g = g(x) be two functions in C[a, b ] and define 


(f.B> 


-f 


f(x)g(x)dx 


( 12 ) 


We will show that this formula defines an inner product on C[a, b] by verifying the four 
inner product axioms for functions f = /(x), g = g(x), and h = /;(x) in C[a, b]: 

nb r>b 

Axiom 1: (f, g) = / f(x)g(x ) dx = / g(x)f(x) dx = (g, f) 

J a J a 


L 


Axiom 2: (f + g, h) = / (/(x) + g{x))h{x) dx 


Axiom 3: (kf. 


pb pb 

— I f(x)h(x)dx+ / g(x)h(x)dx 
J a J a 

= (f, h> + (g, h) 

g> = [ kf(x)g(x) dx = k f 
J a J a 


f(x)g(x)dx = k{ f, g) 


Axiom 4: If f = /(x) is any function in C[a, b ], then 


(f, f > = 



(x) dx > 0 


(13) 


since / 2 (x) > 0 for all x in the interval [a, b]. Moreover, because / is continuous on 
[a, b], the equality in Formula (13) holds if and only if the function / is identically zero 
on [a, b], that is, if and only if f = 0; and this proves that Axiom 4 holds. 


CALCULUS REQUIRED 


EXAMPLE 11 Norm of a Vector in C[a, fa] 

If C[a, b] has the inner product that was defined in Example 10, then the norm of a 
function f = /(x ) relative to this inner product is 


Ilf II = (f,f > 1/2 = 



(x) dx 


(14) 


352 Chap Inner Product Spaces 


Algebraic Properties of 
Inner Products 


and the unit sphere in this space consists of all functions f in C[ci , b] that satisfy the 
equation 

f 2 (x) dx = 1 

Remark Note that the vector space P„ is a subspace of C[a, b] because polynomials are contin- 
uous functions. Thus, Formula ( 12) defines an inner product on P„ that is different from both the 
standard inner product and the evaluation inner product. 



WARNING Recall from calculus that the arc length of a curve y = f(x) over an interval [a, b] 
is given by the formula 


L = 



\f'(x)fdx 


(15) 


Do not confuse this concept of arc length with ||f||, which is the length (norm) of f when f is 
viewed as a vector in C[a,b], Formulas (14) and (15) have different meanings. 


The following theorem lists some of the algebraic properties of inner products that follow 
from the inner product axioms. This result is a generalization of Theorem 3.2.3, which 
applied only to the dot product on R n . 


THEOREM 6.1.2 If u, v, and w are vectors in a real inner product space V, and if k is a 
scalar , then : 

(a) (0, v) = (v, 0) = 0 

( b ) (u, v + w> = (u, v) + (u, w> 

(c) (u, v — w> = (u, v> — (u, w> 

(d) (u — v, w) = (u, w> — (v, w) 

(e) k( u, v) = (u, ky) 

Proof We will prove part (b) and leave the proofs of the remaining parts as exercises. 

(u, V + w) = (v + W, u) I By symmetry I 

= (V, U> + (w, U> | By additivity 1 
= (u, V) + (u, W> | By symmetry] 

The following example illustrates how Theorem 6. 1 .2 and the defining properties of 
inner products can be used to perform algebraic computations with inner products. As 
you read through the example, you will find it instructive to justify the steps. 

► EXAMPLE 12 Calculating with Inner Products 

(u - 2v, 3u + 4v) = (u, 3u + 4v) - (2v, 3u + 4v) 

= (u, 3u) + (u, 4v> - (2v, 3u) - (2v, 4v) 

= 3(u, u) + 4(u, v) — 6(v, u) — 8(v, v) 

= 3||u|| 2 + 4(u, v) — 6(u, v> - 8||v|| 2 
= 3 1| u|| 2 — 2(u, v) — 8||v|| 2 ◄ 
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Exercise Set 6.1 

1. Let R 2 have the weighted Euclidean inner product 

(u, v) = 2 (/ ( Di + 3u 2 v 2 

and let u = (1, 1), v = (3, 2), w = (0, —1), and k = 3. Com- 
pute the stated quantities. 

(a) (u, v) (b) {kx, w) (c) (u + v, w) 

(d) [Ml (e) d( u, v) (f) ||u — Arv|| 

2. Follow the directions of Exercise 1 using the weighted Eu- 
clidean inner product 

<u, v) = ii>i + 5u 2 v 2 

In Exercises 3-4, compute the quantities in parts (a)-(f) of 
Exercise 1 using the inner product on R 2 generated by A. 



’2 f 


’l 0" 

II 

r4 

1 1 

II 

TT 

2 -1 


In Exercises , find a matrix that generates the stated 
weighted inner product on R 2 . 

5. (u, v) = 2u\V\ + 3u 2 v 2 6. (u, v) = \ii\V\ + 5u 2 v 2 

In Exercises 7 8, use the inner product on R 2 generated by the 
matrix A to find (u, v) for the vectors u = (0, —3) and v = (6, 2). 



"4 l" 


2 l" 

Ik. 

II 

2 -3 

II 

a 6 

1 

CO 

7 

1 


In Exercises 10, compute the standard inner product on M 22 
of the given matrices. 



'3 

-2' 


'-1 3 

9. V = 

_4 

8_ 

, V = 

1 1 


1 

2 


f4 61 

10 . u = 

-3 

5 

, V = 

1 

O 

OO 

1 


In Exercises 12, find the standard inner product on P 2 of 
the given polynomials. 

11. p = — 2 + x + 3x 2 , q = 4 — lx 1 

12. p = — 5 + 2x + x 2 , q = 3 + 2x — 4x 2 

In Exercises 14, a weighted Euclidean inner product on 
R 2 is given for the vectors u = (ui, u 2 ) and v = (tq, v 2 ). Find a 
matrix that generates it. 

13. (u, v) = 3«iiq + 5u 2 v 2 14. (u, v) = 4uiV\ + 6u 2 v 2 

In Exercises 15-16, a sequence of sample points is given. Use 
the evaluation inner product on P } at those sample points to find 
(p, q) for the polynomials 

p = x + x 3 and q = 1 + x 2 


15 . x 0 = — 2 , x\ = — 1 , x 2 = 0 , ^ = 1 

16 . x 0 = — 1 , xi = 0 , x 2 = 1 . x 2 = 2 

In Exercises 17-18, find ||u|| andr/(u, v)relativeto the weighted 
Euclidean inner product (u, v) = 2uiV\ + 3u 2 v 2 on R 2 . 

17 . u= (-3,2) and v = (1,7) 

18 . u = (-1,2) and v = (2, 5) 

In Exercises 1 9-20, find ||p|| andrffp, q) relative to the standard 
inner product on P 2 . 

19 . p = — 2 + x + 3x 2 , q = 4 — lx 2 

20. p = — 5 + 2x + x 2 , q = 3 + 2x — 4x 2 

In Exercises 21-22, find ||(/|| and d(U, V ) relative to the stan- 
dard inner product on M 22 . 




'3 

-2' 


'-1 

3 

21 . 

U = 

_4 

8_ 

, V = 

1 

1 



' 1 

2' 


'4 

6" 

22 . 

U = 

-3 

5 

, V = 

0 

8 


In Exercises 23-24, let 

p = x + x 3 and q = 1 + x 2 

Find ||p|| and d(p, q) relative to the evaluation inner product on 
P 2 at the stated sample points. 

23. x 0 = —2, X\ = — 1, x 2 = 0, jc 3 = 1 

24. xo = — 1, X\ = 0, x 2 = 1. x 3 = 2 

In Exercises 25-26, find ||u|| and d( u, v) for the vectors 
u = (—1,2) and v = (2, 5) relative to the inner product on R 2 
generated by the matrix A. 



"4 

0‘ 


1 2" 

25. A = 

3 

5 

26. A = 

-1 3 


In Exercises 27-28, suppose that u, v, and w are vectors in an 
inner product space such that 

(u, v) = 2, (v, w) = —6, (u, w) = —3 

]|u|| = 1, [|v||=2, ||w|| =7 

Evaluate the given expression. 

27. (a) (2v — w, 3u + 2w) (b) ||u + v[| 

28. (a) (u - v - 2w, 4u + v) (b) ||2w - v|| 

In Exercises 29-30, sketch the unit circle in R 2 using the given 
inner product. 

29. (u, v) = \uiV\ + j^u 2 v 2 30. (u, v) = 2u t Vi + u 2 v 2 
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In Exercises 1-32, find a weighted Euclidean inner product 
on R 2 for which the "unit circle” is the ellipse shown in the accom- 
panying figure. 




43. (a) Let u = {u\, u 2 ) and v = (tq, v 2 ). Prove that 

(u, v) = 3;<it>i + 5u 2 v 2 defines an inner product on R 2 by 
showing that the inner product axioms hold. 

(b) What conditions must and k 2 satisfy for 

(u, v) = k\U\V\ + k 2 u 2 v 2 to define an inner product on 
R 2 ? Justify your answer. 

44. Prove that the following identity holds for vectors in any inner 
product space. 

<u,v> = > + v|| 2 - i||u-v|| 2 

45. Prove that the following identity holds for vectors in any inner 
product space. 


In Exercises 33-34, let u = {u\, u 2 , u 2 ) and v = (tq, v 2 , v 2 ). 
Show that the expression does not define an inner product on R 2 , 
and list all inner product axioms that fail to hold. 

33. (u. v) = ujv 2 + u 2 v 2 + u\v\ 

34. (u. v) = u 1 i>i — u 2 v 2 + M 3 D 3 

In Exercises 35-36, suppose that u and v are vectors in an in- 
ner product space. Rewrite the given expression in terms of (u, v), 
||u|| 2 , and ||v|| 2 . 

35. <2v — 4u, u — 3v) 36. (5u + 6v, 4v — 3u) 

37. {Calculus required ) Let the vector space P 2 have the inner 
product 

(p. q) = J p{x)q{x)dx 

Find the following for p = 1 and q = x 2 . 

(a) (p, q) (b) d(p, q) 

(c) IIPII (d) ||q|| 


II u + v || 2 + || u — v || 2 = 2 1 | u || 2 + 2 ||v || 2 

46. The definition of a complex vector space was given in the first 
margin note in Section 4.1. The definition of a complex inner 
product on a complex vector space V is identical to that in 
Definition 1 except that scalars are allowed to be complex 
numbers, and Axiom 1 is replaced by (u, v) = (v, u) . The 
remaining axioms are unchanged. A complex vector space 
with a complex inner product is called a complex inner product 
space. Prove that if V is a complex inner product space, then 
(u, k\) = k{ u, v). 

47. Prove that Formula (5) defines an inner product on R n . 

48. (a) Prove that if v is a fixed vector in a real inner product space 

V, then the mapping T.V^R defined by T (x) = (x, v) 
is a linear transformation. 

(b) Let V = R 2 have the Euclidean inner product, and let 
v = (1,0,2). Compute T(\, 1, 1). 

(c) Let V — P 2 have the standard inner product, and let 
v = 1 + x. Compute T {x + x 2 ). 


38. ( Calculus required) Let the vector space P 2 have the inner 
product 

(P. q) = J p( x )q(x) dx 

Find the following for p = 2x 2 and q = 1 — x 2 . 

(a) (p, q) (b) dip. q) 

(c) llpll (d) q 

{Calculus required) In Exericses 39—40, use the inner product 

(L g) = f f (x)g{x)dx 
Jo 

on C[0, 1] to compute (f, g). 

39. f = cos2rrx, g = sin2rrx 40. f = x, g = e x 

Working with Proofs 

41. Prove parts (a) and ( b ) of Theorem 6.1.1. 

42. Prove parts (c) and {d) of Theorem 6.1.1. 


(d) Let V — P 2 have the evaluation inner product at the points 
x 0 = 1, xi = 0, x 2 = — 1, and let v = 1 + x. Compute 
T{x + x 2 ). 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 

false, and justify your answer. 

(a) The dot product on R 2 is an example of a weighted inner 
product. 

(b) The inner product of two vectors cannot be a negative real 
number. 

(c) (u, V + w) = (v, u) + (w, u). 

(d) (ku.kx) = A' 2 (u, v). 

(e) If (u, v) = 0, then u = 0 or v = 0. 

(f) If ||v || 2 = 0, then v = 0. 

(g) If A is an n x n matrix, then (u, v) = Au • Av defines an inner 
product on R n . 
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Working withTechnology 


and let 


Tl. (a) Confirm that the following matrix generates an inner 
product. 

‘5 8 6 -13- 

3-1 0-9 

A = 

0 1-1 0 
.2 4 3 -5. 


(b) For the following vectors, use the inner product in part (a) to 
compute (u, v), first by Formula (5) and then by Formula (6). 



- r 


- 0" 


-2 


1 

u = 

0 

and v = 

-1 


3. 


2. 


T2. Let the vector space _P 4 have the evaluation inner product at 
the points 

- 2 , - 1 , 0 , 1 . 2 


p = p(x) = x + x 3 and q = q(x) = 1 + x 2 + x 4 

(a) Compute (p, q), ||p||, and ||q||. 

(b) Verify that the identities in Exercises 44 and 45 hold for the 
vectors p and q. 


T3. Let the vector space M 33 have the standard inner product and 
let 



1 

-2 

3" 


'2 

-1 

O' 

u= U = 

-2 

4 

1 

and v = V = 

1 

4 

3 


3 

1 

0 _ 


1 

0 

2 


(a) Use Formula (8) to compute (u, v), ||u||, and ||v||. 

(b) Verify that the identities in Exercises 44 and 45 hold for the 
vectors u and v. 


6.2 Angle and Orthogonality in Inner Product Spaces 

In Section 3.2 we defined the notion of “angle” between vectors in R" . In this section we 
will extend this idea to general vector spaces. This will enable us to extend the notion of 
orthogonality as well, thereby setting the groundwork for a variety of new applications. 


Cauch y-Sch wa rz Inequality Recall from Formula (20) of Section 3.2 that the angle 0 between two vectors u and v in 

R" is 

„-t 


6 — cos 


u • v 

lull llvll 


( 1 ) 


We were assured that this formula was valid because it followed from the Cauchy- 
Schwarz inequality (Theorem 3.2.4) that 


u • v 

-1 < < 1 

Hull ||v|| 


(2) 


as required for the inverse cosine to be defined. The following generalization of the 
Cauchy-Schwarz inequality will enable us to define the angle between two vectors in any 
real inner product space. 


Cauchy-Schwarz Inequality 

If u and v are vectors in a real inner product space V, then 

l(u,v)| < Hull ||v|| (3) 


Proof We warn you in advance that the proof presented here depends on a clever trick 
that is not easy to motivate. 

In the case where u = 0 the two sides of (3) are equal since (u, v) and ||u|| are both 
zero. Thus, we need only consider the case where u / 0. Making this assumption, let 


a = (u, u), b = 2(u, v), c = (v, v) 
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and let t be any real number. Since the positivity axiom states that the inner product of 
any vector with itself is nonnegative, it follows that 

0 < (tu + v, tu + v) = (u, u)t 2 + 2(u, \)t + (v, v} 

= at 2 + bt + c 

This inequality implies that the quadratic polynomial at 2 + bt + c has either no real 
roots or a repeated real root. Therefore, its discriminant must satisfy the inequality 
b 1 — 4a c < 0. Expressing the coefficients a, b, and c in terms of the vectors u and v 
gives 4(u, v} 2 — 4(u, u) (v, v) < 0 or, equivalently, 

(u, v} 2 < (u, u)(v, v) 

Taking square roots of both sides and using the fact that (u, u) and (v, v) are nonnegative 
yields 

|(u, v)| < (u, u) 1/2 (v, v) 1/2 or equivalently |(u, v)| < ||u||||v|| 
which completes the proof. 

The following two alternative forms of the Cauchy-Schwarz inequality are useful to 
know: 


[u, v) 2 < (u,u)(v,v) 

(4) 

:u,v) 2 <nuii 2 iiv|i 2 

(5) 


The first of these formulas was obtained in the proof of Theorem 6.2. 1 , and the second 
is a variation of the first. 


Angle Between Vectors 


Our next goal is to define what is meant by the “angle” between vectors in a real inner 
product space. As a first step, we leave it as an exercise for you to use the Cauchy-Schwarz 
inequality to show that 


u, v> 

-l < J. — L < i 
||u|| |M| 

This being the case, there is a unique angle 0 in radian measure for which 

(u, v) 


COS0 = 


u v 


and 0 < 0 < n 


(Figure 6.2.1). This enables us to define the angle 6 between u and v to be 


( 6 ) 

(7) 


6 = cos 1 



( 8 ) 



6 


► Figure 6.2.1 
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Properties of Length and 
Distance in General Inner 
Product Spaces 


Orthogonality 


► EXAMPLE 1 Cosine of the Angle Between Vectors in /W 2 2 

Let M 12 have the standard inner product. Find the cosine of the angle between the 
vectors 

" -1 O" 

and v = V = 


u= U = 


1 2 
3 4 


3 2 


Solution We showed in Example 6 of the previous section that 
(u, v) = 16, || u || = a/30, || v|| = 714 
from which it follows that 

(u, v) 16 


cos 0 = 


V30a/14 


0.78 


In Section 3.2 we used the dot product to extend the notions of length and distance to R " , 
and we showed that various basic geometry theorems remained valid (see Theorems 3.2.5, 
3.2.6, and 3.2.7). By making only minor adjustments to the proofs of those theorems, 
one can show that they remain valid in any real inner product space. For example, here 
is the generalization of Theorem 3.2.5 (the triangle inequalities). 


.2.2 If u, v, and w are vectors in a real inner product space V, and if k is 
any scalar, them 

(a) || u + v|| < II LI I + || v H | Triangle inequality for vectors | 

(b) d(u. v) < d(u, W) + d(\\, v) [Triangle inequality for distances] 


Proof [a) 

II u + v|| 2 = (u + V, u + v) 

= (u, u) + 2(u, v) + (v, v) 

< (u,u) + 2|(u,v)| -J- (v, v) [Property of absolute value] 

< (u,u) + 2||u||||v|| + (v,v) [By (3)] 

= ||u|| 2 + 2 1| u || || v|| + 1 1 v || 2 

= (l|u|| + ||v||) 2 

Taking square roots gives ||u + v|| < ||u|| + ||v||. 

Proof [b) Identical to the proof of part (h) of Theorem 3.2.5. 


Although Example 1 is a useful mathematical exercise, there is only an occasional need 
to compute angles in vector spaces other than R 1 and R 3 . A problem of more interest 
in general vector spaces is ascertaining whether the angle between vectors is n/2. You 
should be able to see from Formula (8) that if u and v are nonzero vectors, then the angle 
between them is 9 = n/2 if and only if (u, v) = 0. Accordingly, we make the following 
definition, which is a generalization of Definition 1 in Section 3.3 and is applicable even 
if one or both of the vectors is zero. 


DEFINITION 1 Two vectors u and v in an inner product space V called orthogonal if 
(u, v) = 0. 
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As the following example shows, orthogonality depends on the inner product in the 
sense that for different inner products two vectors can be orthogonal with respect to one 
but not the other. 

► EXAMPLE 2 Orthogonality Depends on the Inner Product 

The vectors u = (1, 1) and v = (1,-1) are orthogonal with respect to the Euclidean 
inner product on R 2 since 

u • v = ( 1 ) ( 1 ) + (1) ( — 1) = 0 

However, they are not orthogonal with respect to the weighted Euclidean inner product 
(u, v) = + 2.U 1 V 2 since 

(u, v) = 3(l)(l) + 2(l)(— 1) = 1#0 


► EXAMPLE 3 Orthogonal Vectors in M22 

If M 22 has the inner product of Example 6 in the preceding section, then the matrices 



'1 o' 


'0 

2' 

u = 

1 1 

and V = 

0 

0 


are orthogonal since 

(U, V) = 1(0) + 0(2) + 1(0) + 1(0) = 0 


CALCULUS REQUIRED 


► EXAMPLE 4 Orthogonal Vectors in P2 

Let P 2 have the inner product 


(P 

and let p = x and q = x 2 . Then 

IIpII = (p. p) 1/2 = 

llqll = (q, q> 1/2 = 




p(x)q(x) dx 




(p, q) = / xx~ dx = I x dx — 0 


L 

/: 


-1 1/2 


xx dx 


1 


x 1 dx 


1/2 


x 2 x~ > dx 


1 1/2 r rl "i 1/2 


-DC 


x 4 dx 


Because (p, q) = 0, the vectors p = x and q = x 2 are orthogonal relative to the given 
inner product. 


In Theorem 3.3.3 we proved the Theorem of Pythagoras for vectors in Euclidean 
n -space. The following theorem extends this result to vectors in any real inner product 
space. 


Generalized Theorem of Pythagoras 

If u and v are orthogonal vectors in a real inner product space , then 

||u + v|| 2 = ||u|| 2 + ||v|| 2 
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CALCULUS REQUIRED 


Orthogonal Complements 


Proof The orthogonality of u and v implies that (u, x) = 0, so 

|| u + v|| 2 = (u + v, u + v> = ||u|| 2 + 2(u, v) + || v || 2 
= ||u|| 2 + ||v|| 2 


► EXAMPLE 5 Theorem of Pythagoras in P 2 

In Example 4 we showed that p = x and q = x 2 are orthogonal with respect to the inner 
product 


c. ,) = /_' 


p(x)q(x ) dx 


on P 2 . It follows from Theorem 6.2.3 that 


llp + qll 2 = llpll 2 + llqll 2 


Thus, from the computations in Example 4, we have 

lip + <"!==(/!) + ^ 

We can check this result by direct integration: 


16 

L5 


||p + q|| = (p + q, p + q) = / (x + x )(x + x~) dx 


/:< 


= J x 1 dx + 2 J x 3 dx + J . 


= I x z dx + 2 j x z dx + I x 4 dx — - + 0 + ^ 


16 

15 


In Section 4.8 we defined the notion of an orthogonal complement for subspaces of R" , 
and we used that definition to establish a geometric link between the fundamental spaces 
of a matrix. The following definition extends that idea to general inner product spaces. 


DEFINITION 2 If W is a subspace of a real inner product space V. then the set of 
all vectors in V that are orthogonal to every vector in W is called the orthogonal 
complement of W and is denoted by the symbol VT X . 


In Theorem 4.8.6 we stated three properties of orthogonal complements in R n . The 
following theorem generalizes parts (a) and ( b ) of that theorem to general real inner 
product spaces. 


THEOREM 6.2.4 IfW is a subspace of a real inner product space V, their. 

(a) W 1 - is a sub space ofV. 

(b) W n = {0}. 

Proof (a) The set W L contains at least the zero vector, since (0, w> = 0 for every vector 
w in W. Thus, it remains to show that W ± is closed under addition and scalar multipli- 
cation. To do this, suppose that u and v are vectors in VT X , so that for every vector w in 
W we have (u, w) = 0 and (v, w) = 0. It follows from the additivity and homogeneity 
axioms of inner products that 

(u + v, w} = (u, w> + (v, w> = 0 + 0 = 0 
(£u, w} = k{ u, w> = k( 0) = 0 

which proves that u + v and ku are in W L . 
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Theorem 6.2.5 implies that 
in a finite-dimensional in- 
ner product space orthogonal 
complements occur in pairs, 
each being orthogonal to the 
other (Figure 6.2.2). 

W L 

k 

W 



A Figure 6.2.2 Each vector in 
W is orthogonal to each vector 
in and conversely. 


Proof (b) If v is any vector in both W and W 1 -, then v is orthogonal to itself; that is, 
(v, v) = 0. It follows from the positivity axiom for inner products that v = 0. 

The next theorem, which we state without proof, generalizes part ( c ) of Theo- 
rem 4.8.6. Note, however, that this theorem applies only to finite-dimensional inner 
product spaces, whereas Theorem 4.8.6 does not have this restriction. 


IfW is a subspace of a real finite -dimensional inner product space V, 
then the orthogonal complement ofW x is W; that is, 

or - 1 )- 1 = w 


In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the 
row space and null space of a matrix are orthogonal complements with respect to the 
Euclidean inner product on R" (Theorem 4.8.7). The following example takes advantage 
of that fact. 


► EXAMPLE 6 Basis for an Orthogonal Complement 

Let W be the subspace of R b spanned by the vectors 

W! = (1, 3, -2, 0, 2, 0), w 2 = (2, 6, -5, -2, 4, -3), 

w 3 = (0, 0, 5, 10, 0, 15), w 4 = (2, 6, 0, 8, 4, 18) 


Find a basis for the orthogonal complement of W. 


Solution The subspace W is the same as the row space of the matrix 

"1 3 -2 0 2 0" 

_ 2 6 -5 -2 4 -3 

"" 0 0 5 10 0 15 

2 6 0 8 4 18_ 

Since the row space and null space of A are orthogonal complements, our problem 
reduces to finding a basis for the null space of this matrix. In Example 4 of Section 4.7 
we showed that 


■-3" 


■-4" 


"— 2" 

1 


0 


0 

0 


-2 


0 


, V 2 = 


, v 3 = 


0 


1 


0 

0 


0 


1 

0 _ 


0 _ 


0 _ 


form a basis for this null space. Expressing these vectors in comma-delimited form (to 
match that of wi , w 2 , w 3 , and w 4 ), we obtain the basis vectors 


vi = (-3, 1, 0, 0, 0, 0), v 2 = (-4, 0, -2, 1, 0, 0), 


v 3 = (-2,0, 0,0, 1,0) 


You may want to check that these vectors are orthogonal to \\\, w 2 , w 3 , and w 4 by 
computing the necessary dot products. M 
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Exercise Set 6.2 


In Exercises 1-2, find the cosine of the angle between the vec- 
tors with respect to the Euclidean inner product. 

1. (a) u = (1,-3), v= (2,4) 

(b) u= (-1,5,2), v = (2,4, -9) 

(c) u = (1, 0, 1, 0), v = (-3, -3, -3, -3) 

2. (a) u= (-1,0), v = (3,8) 

(b) u = (4, 1,8), v = (1,0, -3) 

(c) u = (2, 1,7, -1), v = (4,0,0, 0) 

In Exercises , find the cosine of the angle between the vec- 
tors with respect to the standard inner product on P 2 . 

3. p = — 1 + 5x + 2x 2 , q = 2 + 4x — 9x 2 

4. p = x — x 2 , q = 7 + 3x + 3x 2 

In Exercises 5 , find the cosine of the angle between A and B 
with respect to the standard inner product on M 22 . 




'2 

6' 


'3 

2' 


5. 

A = 

_1 

— 3_ 

, B = 

_1 

0 _ 




' 2 

4' 


'-3 


r 

6. 

A = 

-1 

3 

, B = 

4 


2 


15. If the vectors u = (1,2) and v= (2,-4) are orthogonal 
with respect to the weighted Euclidean inner product 

(u, v) = wiuiVi + w 2 u 2 v 2 , what must be true of the weights 
Wi and wp 

16. Let R 4 have the Euclidean inner product. Find two unit vec- 
tors that are orthogonal to all three of the vectors 

u= (2, 1, -4. 0), v = (-1. -1,2,2), and w = (3,2, 5,4). 


17. Do there exist scalars k and / such that the vectors 

P! = 2 + kx + 6x 2 , p 2 = / + 5x + 3x 2 , p 3 = 1 + 2x + 3x 2 

are mutually orthogonal with respect to the standard inner 
product on Pp. 

18. Show that the vectors 



" 3 ' 


5" 

u = 

3 

and v = 

-8 


are orthogonal with respect to the inner product on R 2 that is 
generated by the matrix 


A = 


2 

1 


1 

1 


In Exercises , determine whether the vectors are orthogonal 
with respect to the Euclidean inner product. 

7. (a) u = (-1,3,2), v = (4,2, -1) 

(b) u = (-2, -2, -2), v = (1, 1, 1) 

(c) u = (a, b), v = (— b, a) 

8. (a) u = («i, u 2 , up, v = (0, 0, 0) 

(b) u = (-4, 6, -10, 1), v = (2, 1, -2, 9) 

(c) u = ( a , b, c), v = (— c, 0, a) 

In Exercises 9-10, show that the vectors are orthogonal with 
respect to the standard inner product on P 2 . 

9. p = — 1 — x + 2x 2 , q = 2x + x 2 

10. p = 2 — 3x + x 2 , q = 4 + 2x — 2x 2 

In Exercises 1 -12, show that the matrices are orthogonal with 
respect to the standard inner product on M 22 . 



2 

f 


'-3 

0" 

11. U = 

-1 

3. 

, v = 

0 

2_ 


'5 

-f 


' 1 

3" 

12. U = 

2 

-2 

, V = 

-1 

0_ 


In Exercises 13-14, show that the vectors are not orthogonal 
with respect to the Euclidean inner product on R 2 , and then find 
a value of k for which the vectors are orthogonal with respect to 
the weighted Euclidean inner product (u, v) = 2u l v l + ku 2 v 2 . 

13. u = (1, 3), v = (2,-1) 14. u = (2, -4), v = (0, 3) 


[See Formulas (5) and (6) of Section 6. 1 .] 

19. Let P 2 have the evaluation inner product at the points 

Xo = —2, X] = 0, x? = 2 

Show that the vectors p = x and q = x 2 are orthogonal with 
respect to this inner product. 

20. Let M 2 2 have the standard inner product. Determine whether 
the matrix A is in the subspace spanned by the matrices U 
and V. 



"-1 l" 


"l -l" 


"4 

o' 

A = 

0 2 

, u = 

3 0 

, V = 

9 

2 


In Exercises 21-24, confirm that the Cauchy-Schwarz inequal- 
ity holds for the given vectors using the stated inner product. 

21. u = (1, 0, 3), v = (2, 1,-1) using the weighted Euclidean in- 
ner product (u, v) = 2u\V\ + 3u 2 v 2 + u 2 v 2 in R 2 . 



'-1 2' 


'1 0" 

22. U = 


and V = 


6 1 


.3 3_ 


using the standard inner product on M 22 . 

23. p = — 1 + 2x + x 2 and q = 2 — 4x 2 using the standard inner 
product on P 2 . 

24. The vectors 



T 


' r 

u = 

_ 1 _ 

and v = 

_-l_ 


with respect to the inner product in Exercise 18. 
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25. Let R 4 have the Euclidean inner product, and let 

u = (— 1 . 1,0,2). Determine whether the vector u is orthogo- 
nal to the subspace spanned by the vectors wi = (1,— 1,3, 0) 
andw 2 = (4, 0, 9, 2). 

26. Let Pi have the standard inner product, and let 

p = — 1 — x + lx 1 + 4x 3 

Determine whether p is orthogonal to the subspace spanned by 
the polynomials Wi = 2 — x 2 + x 3 and w 2 = 4x — 2x 1 + 2x 3 . 

In Exercises 27-28, find a basis for the orthogonal complement 
of the subspace of R" spanned by the vectors. 

27. V! = (1, 4. 5, 2), v 2 = (2, 1, 3. 0), v 3 = (-1, 3, 2, 2) 

28. vi = (1, 4. 5, 6, 9), v 2 = (3, -2, 1,4, -1), 

v 3 = (-1, 0. -1, -2, -1), v 4 = (2, 3, 5, 7, 8) 

In Exercises 29-30, assume that R 1 ' has the Euclidean inner 
product. 

29. (a) Let W be the line in R 2 with equation y = 2x. Find an 

equation for W ± . 

(b) Let W be the plane in R 3 with equation x — 2y — 3z = 0. 
Find parametric equations for W ± . 

30. (a) Let W be the y-axis in an vyz-coordinate system in R 3 . 

Describe the subspace W ± . 

(b) Let W be the yz-plane of an xyz-coordinate system in R 3 . 
Describe the subspace W L . 


35. ( Calculus required) Let C[0. 1] have the inner product in Ex- 
ercise 3 1 . 

(a) Show that the vectors 

p = p(x) = 1 and q = q(x) = \ — x 
are orthogonal. 

(b) Show that the vectors in part (a) satisfy the Theorem of 
Pythagoras. 

36. ( Calculus required) Let C[— 1, 1] have the inner product in 
Exercise 33. 

(a) Show that the vectors 

p = p(x) = v and q = q(x) — x 2 — 1 
are orthogonal. 

(b) Show that the vectors in part (a) satisfy the Theorem of 
Pythagoras. 

37. Let V be an inner product space. Show that if u and v are 
orthogonal unit vectors in V, then ||u — v|| = \pl. 

38. Let V be an inner product space. Show that if w is orthogonal 
to both U! and u 2 , then it is orthogonal to /qui + fc 2 u 2 for all 
scalars k\ and fc 2 . Interpret this result geometrically in the case 
where V is R 3 with the Euclidean inner product. 

39. ( Calculus required) Let C[0, 7r] have the inner product 

(f. g) = f f(x)g(x)dx 
Jo 


31. ( Calculus required) Let C[0, 1] have the integral inner product 


p.q > = / 

Jo 


(p. q>= / p(x)q(x)dx 

and let p = p(x) = x and q = q(x) = x 2 . 

(a) Find (p, q). 

(b) Find ||p|| and ||q||. 


and let f„ = cos nx (n = 0, 1, 2, . . .). Show that if k ^ Z, then 
L and f i are orthogonal vectors. 

40. As illustrated in the accompanying figure, the vectors 

u = ( 1 , x/3 ) and v = (— 1 , a/ 3 ) have norm 2 and an angle 
of 60° between them relative to the Euclidean inner product. 
Find a weighted Euclidean inner product with respect to which 
u and v are orthogonal unit vectors. 


32. (a) Find the cosine of the angle between the vectors p and q 

in Exercise 3 1 . 

(b) Find the distance between the vectors p and q in Exer- 
cise 3 1 . 

33. ( Calculus required) Let C[— 1, 1] have the integral inner 
product 

(P. q) = J p(x)q(x)dx 

and let p = p(x) — x 2 — x and q = q(x) — x + 1. 

(a) Find (p, q). 

(b) Find ||p|| and ||q||. 

34. (a) Find the cosine of the angle between the vectors p and q 

in Exercise 33. 

(b) Find the distance between the vectors p and q in Exer- 
cise 33. 



Figure Ex-40 


Working with Proofs 

41. Let V be an inner product space. Prove that if w is orthogonal 
to each of the vectors iq, u 2 , . . . , u r , then it is orthogonal to 
every vector in spanfui, u 2 , . . . , u r }. 

42. Let { Vi , v 2 , . . . , v r ) be a basis for an inner product space V. 
Prove that the zero vector is the only vector in V that is or- 
thogonal to all of the basis vectors. 
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43. Let {wi, W 2 , . . . , Wjt} be a basis for a subspace W of V. Prove 
that IV x consists of all vectors in V that are orthogonal to 
every basis vector. 

44. Prove the following generalization of Theorem 6.2.3: If 
Vi, v 2 , . . . , v,. are pairwise orthogonal vectors in an inner 
product space V, then 

||V! + v 2 H h v r || 2 = llVill 2 + ||v 2 || 2 H h ||v,-|| 2 

45. Prove: If u and v are n x 1 matrices and A is an n x n matrix, 
then 

(v 7 A 7 Au ) 2 < (u t A t Au)(v t A t A\) 


(a) Assuming that P 2 has the standard inner product, find all 
vectors q in P 2 such that (p, q) = (2"(p), 2™(q)). 

(b) Assuming that P 2 has the evaluation inner product at the 
points jt 0 = — 1, X\ = 0, *2 = 1, find all vectors q in P 2 
such that (p, q) = <r(p), T(q)>. 

True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 

(a) If u is orthogonal to every vector of a subspace IV, then u = 0. 


46. Use the Cauchy-Schwarz inequality to prove that for all real 
values of a, b, and 6, 

(a cos Q + b sin d) 2 < a 2 + b 2 

47. Prove: If Wi, w 2 , . . . , w n are positive real numbers, and 
if u = («i, « 2 , . . . , u„) and v = (tq, U 2 , . . . , v„) are any two 
vectors in R" , then 

\W\UlV\ + w 2 u 2 v 2 -I + W„U n V„ I 

< (w, uj + w 2 u\ -\ 1- w n u 2 y p -{wi vf + w 2 v\ -\ 1- w n v 2 ) 112 


(b) If u is a vector in both IV and W L , then u = 0. 

(c) If u and v are vectors in W x , then u + v is in W 2 -. 

(d) If u is a vector in W L and k is a real number, then fcu is in W L . 

(e) If u and v are orthogonal, then |(u, v)| = ||u||||v||. 


48. Prove that equality holds in the Cauchy-Schwarz inequality if 
and only if u and v are linearly dependent. 

49. ( Calculus required) Let f(x) and g(x) be continuous functions 
on [0, 1], Prove: 

-i 2 


(a) 

(b) 


/' 

A 


f(x)g(x ) dx 


f f 2 (x)dx [ 
Jo Jo 


-I 1/2 


Uix) + g(x)]-dx 


f 


g (x)dx 
nl/2 


f 2 (x)dx 


f 

Jo 


g 2 (x)dx 


1/2 


[Hint: Use the Cauchy-Schwarz inequality.] 


A = 


50. Prove that Formula (4) holds for all nonzero vectors u and v 
in a real inner product space V. 

51. Let Ta'. R 2 — > R 2 be multiplication by 

1 f 

-i 1 

and let x = (1, 1). 

(a) Assuming that R 2 has the Euclidean inner product, find 
all vectors v in R 2 such that (x, v) = (T A (x), T A (\)). 

(b) Assuming that R 2 has the weighted Euclidean inner prod- 
uct (u, v) = 2u\Vi + 3 u 2 v 2 , find all vectors v in R 2 such 
that (x, v) = (T a (x), r A (v)>. 

52. Let T: P 2 — > P 2 be the linear transformation defined by 

T{a + bx + cx 2 ) = 3a — cx 1 
and let p = 1 + x. 


(f ) If u and v are orthogonal, then ||u + v|| = ||u|| + ||v||. 

Working with Technology 

Tl. (a) We know that the row space and null space of a matrix 
are orthogonal complements relative to the Euclidean inner 
product. Confirm this fact for the matrix 


A = 


'2 

4 

3 

4 

7 


-1 

-3 

-2 

-1 

-6 


3 

1 

3 

15 

-7 


5' 

3 

4 
17 

0 


(b) Find a basis for the orthogonal complement of the column 
space of A. 

T2. In each part, confirm that the vectors u and v satisfy the 
Cauchy-Schwarz inequality relative to the stated inner product. 

(a) M 44 with the standard inner product. 



-1 

0 

2 

0" 


- 2 

2 

1 

3- 


0 

-1 

0 

1 


3 

-1 

0 

1 

u = 

3 

0 

0 

2 

and v = 

1 

0 

0 

-2 


.0 

4 

-3 

0_ 


.-3 

1 

2 

0_ 


(b) R 4 with the weighted Euclidean inner product with weights 
V)\ = \,w 2 = u> 3 = |, u> 4 = 

u= (1,-2, 2,1) and v = (0, -3, 3, -2) 
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6.3 Gram-Schmidt Process; Q/?-Decomposition 

In many problems involving vector spaces, the problem solver is free to choose any basis for 
the vector space that seems appropriate. In inner product spaces, the solution of a problem 
can often be simplified by choosing a basis in which the vectors are orthogonal to one 
another. In this section we will show how such bases can be obtained. 


Orthogonal and 
Orthonormal Sets 


Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal 
if their inner product is zero. The following definition extends the notion of orthogonality 
to sets of vectors in an inner product space. 


DEFINITION 1 A set of two or more vectors in a real inner product space is said to be 
orthogonal if all pairs of distinct vectors in the set are orthogonal. An orthogonal set 
in which each vector has norm 1 is said to be orthonormal. 


► EXAMPLE 1 An Orthogonal Set in R 3 

Let 

vi = (0,1,0), v 2 = (1,0,1), v 3 = (1, 0, — 1) 
and assume that R 3 has the Euclidean inner product. It follows that the set of vectors 
S = {vi, v 2 , v 3 } is orthogonal since (vi, v 2 ) = (vi, v 3 ) = (v 2 , v 3 ) = 0. ◄ 


Note that Formula ( 1) is iden- 
tical to Formula (4) of Sec- 
tion 3.2, but whereas For- 
mula (4) was valid only for vec- 
tors in R" with the Euclidean 
inner product, Formula (1) is 
valid in general inner product 
spaces. 


It frequently happens that one has found a set of orthogonal vectors in an inner 
product space but what is actually needed is a set of orthonormal vectors. A simple way 
to convert an orthogonal set of nonzero vectors into an orthonormal set is to multiply 
each vector v in the orthogonal set by the reciprocal of its length to create a vector of 
norm 1 (called a unit vector). To see why this works, suppose that v is a nonzero vector 
in an inner product space, and let 

1 


u = v 

II v|| 

Then it follows from Theorem 6.1.1 (h) with k = ||v|| that 


1 1 1 



= 1 


( 1 ) 


This process of multiplying a vector v by the reciprocal of its length is called normalizing v. 
We leave it as an exercise to show that normalizing the vectors in an orthogonal set of 
nonzero vectors preserves the orthogonality of the vectors and produces an orthonormal 
set. 


► EXAMPLE 2 Constructing an Orthonormal Set 

The Euclidean norms of the vectors in Example 1 are 


llvill = 1, ||v 2 || = \/2, || v 3 1| = \[l 

Consequently, normalizing m, u 2 , and u 3 yields 


vi 


vi 


(0,1,0), u 2 = — 

II v 2 || 





1 1 

-P,0, -= 


Ul = 
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We leave it for you to verify that the set S — {uj , u 2 , u 3 } is orthonormal by showing that 
<ui,u 2 ) = (ui,u 3 > = <u 2 , u 3 ) = 0 and ||m|| = ||u 2 || = ||u 3 || = 1 


In R 2 any two nonzero perpendicular vectors are linearly independent because neither 
is a scalar multiple of the other; and in R 3 any three nonzero mutually perpendicular 
vectors are linearly independent because no one lies in the plane of the other two (and 
hence is not expressible as a linear combination of the other two). The following theorem 
generalizes these observations. 


IfS - {V! , v 2 , . . . , v„} is an orthogonal set of nonzero vectors in an 
inner product space, then S is linearly independent. 


Proof Assume that 

k\\ i + k 2 \ 2 H b k n \ n = 0 (2) 

To demonstrate that S = {vi, v 2 , . . . , v„} is linearly independent, we must prove that 
k\ = k 2 = ■ ■ ■ = k n = 0 . 

For each v,- in S, it follows from (2) that 

(fcjVl + k 2 \2 H 1- k n \ n , V;) = (0, V;) = 0 


or, equivalently, 

&l(Vl,V;) +k 2 {\ 2 ,\i) H b k„{y n , V;) = 0 

From the orthogonality of S it follows that (v,-, v,} =0 when j f i, so this equation 
reduces to 

ki{\i,\i) = 0 

Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom 
for inner products that (v,- , v, ) ^ 0. Thus, the preceding equation implies that each k ,■ in 
Equation (2) is zero, which is what we wanted to prove. 


Since an orthonormal set is or- 
thogonal, and since its vectors 
are nonzero (norm 1), it fol- 
lows from Theorem 6.3.1 that 
every orthonormal set is lin- 
early independent. 


In an inner product space, a basis consisting of orthonormal vectors is called an 
orthonormal basis, and a basis consisting of orthogonal vectors is called an orthogonal 
basis. A familiar example of an orthonormal basis is the standard basis for R" with the 
Euclidean inner product: 

d = (1,0,0, ...,0), e 2 = (0, 1,0, . . . , 0), ..., e„ = (0, 0, 0, . . . , 1) 


► EXAMPLE 3 An Orthonormal Basis for P n 

Recall from Example 7 of Section 6. 1 that the standard inner product of the polynomials 
p = flo + a\x + ■ ■ ■ + a n x n and q = bo + b\x + ■ ■ ■ + b„x n 


is 


(p, q) = a 0 b 0 + a\b\ H b a„b„ 

and the norm of p relative to this inner product is 

IIPlI = \/ (P. P> = sjal + a\ H b a% 


You should be able to see from these formulas that the standard basis 

S = {l,x, x 2 , . . . , x n } 

is orthonormal with respect to this inner product. 
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Coordinates Relative to 
Orthonormal Bases 


► EXAMPLE 4 An Orthonormal Basis 

In Example 2 we showed that the vectors 

"i — (0, 1.0), *=(^. 0.2=). and 0.--L) 

form an orthonormal set with respect to the Euclidean inner product on R 3 . By Theorem 
6.3.1, these vectors form a linearly independent set, and since R 3 is three-dimensional, 
it follows from Theorem 4.5.4 that S = {ui, Ut, U 3} is an orthonormal basis for R 3 . 


One way to express a vector u as a linear combination of basis vectors 

S = {vi, v 2 , . . . , v„} 

is to convert the vector equation 

u = C1V1 + c 2 v 2 H h c„\ n 

to a linear system and solve for the coefficients c\, c 2 , . . . , c n . However, if the basis 
happens to be orthogonal or orthonormal, then the following theorem shows that the 
coefficients can be obtained more simply by computing appropriate inner products. 


THEOREM 6.3.2 

(a) If S = {vi, V2, . . . , v„} is an orthogonal basis for an inner product space V, and if 
u is any vector in V, then 



(3) 


(b) If S = {y 1, V2, . . . , T„} is an orthonormal basis for an inner product space V, and 
if u is any vector in V , then 


u = (u, V!)vi + (u. \ 2 )\2 H h (u, v„)v„ 


(4) 


Proof [a) Since S — { v 1 , v 2 , . . . , v„ } is a basis for V. every vector u in V can be expressed 
in the form 

u = C1V1 + c 2 v 2 H h c„v„ 

We will complete the proof by showing that 


for i — 1,2 To do this, observe first that 

(U, V;} = (CiVi + C 2 V 2 H h C„V„, V/) 

= Ci(Vi,V;} +c 2 (v 2 , v,-} H h c n (v„, V,) 

Since S is an orthogonal set, all of the inner products in the last equality are zero except 
the /th, so we have 

(u, v,) = c/(v f , v,-> = Ci || v,- 1| 2 

Solving this equation for c, yields (5), which completes the proof. 

Proof (b) In this case, ||vi || = ||v 2 || = • • • = ||v„ || = 1, so Formula (3) simplifies to For- 
mula (4). 
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Using the terminology and notation from Definition 2 of Section 4.4, it follows from 
Theorem 6.3.2 that the coordinate vector of a vector u in V relative to an orthogonal 
basis S = {vi , v 2 , . . . , v„} is 


(u )s = 


(u, vi) (u, v 2 ) 


Hull 2 ’ ||v 2 || 2 

and relative to an orthonormal basis S = {vi, v 2 , . . 

(u)s = ((u, Vi), (u, v 2 ), . . 


(u, v„) 
llv II ^ 

II ~n II 

. , v„} is 
• , (u, v„» 


( 6 ) 

(7) 


► EXAMPLE 5 A Coordinate Vector Relative to an Orthonormal Basis 

Let 

* = (0.1.0), v 2 = (-|,0, |), v 3 = (|,0, |) 

It is easy to check that S = {vi , v 2 , V3} is an orthonormal basis for R 3 with the Euclidean 
inner product. Express the vector u = (1, 1, 1) as a linear combination of the vectors in 
S , and find the coordinate vector (u)$. 


Solution We leave it for you to verify that 

(u,Vi) = l, (u, v 2 ) = -i, and (u, v 3 ) = \ 
Therefore, by Theorem 6.3.2 we have 


Vl 


5 * + 5 * 


that is, 


(1. 1. 1) = (O’ !> 0) — j(— 3’ 0> D + KI-O. I) 

Thus, the coordinate vector of u relative to S is 

(u) s = ((u,vi), (u,v 2 ), (u,v 3 )) = (1, -5, 7) 


► EXAMPLE 6 An Orthonormal Basis from an Orthogonal Basis 

(a) Show that the vectors 

w 1 = (0, 2, 0), w 2 = (3, 0, 3), w 3 = (-4, 0, 4) 

form an orthogonal basis for R 3 with the Euclidean inner product, and use that 
basis to find an orthonormal basis by normalizing each vector. 

(b) Express the vector u = (1, 2, 4) as a linear combination of the orthonormal basis 
vectors obtained in part (a). 

Solution (a) The given vectors form an orthogonal set since 

(wi, w 2 ) = 0, (wi, w 3 ) = 0, (w 2 , w 3 ) = 0 

It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form 
a basis for R 3 by Theorem 4.5.4. We leave it for you to calculate the norms of Wi, w 2 , 
and w 3 and then obtain the orthonormal basis 
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Solution (b) It follows from Formula (4) that 

U = (u, Vi)Vi + (u, V2) V2 + (u, V3) V3 
We leave it for you to confirm that 

(u, Vi) = (1, 2, 4) • (0, 1, 0) = 2 


1 


1 


u, v 2 , = (1, 2.4, - ( — , 0, — j = — 


1 


1 


,) = (.. 2, 4,. 1-^.0.^; = ^ 


and hence that 


(1,2, 4) = 2(0, 1,0) + 




,0, 


V2 VV2’ ’ V2J V2 V V2’ 'Vi- 


Orthogonal Projections Many applied problems are best solved by working with orthogonal or orthonormal 

basis vectors. Such bases are typically found by starting with some simple basis (say a 
standard basis) and then converting that basis into an orthogonal or orthonormal basis. 
To explain exactly how that is done will require some preliminary ideas about orthogonal 
projections. 

In Section 3.3 we proved a result called the Projection Theorem (see Theorem 3.3.2) 
that dealt with the problem of decomposing a vector u in R" into a sum of two terms, 
wi and W 2 , in which wj is the orthogonal projection of u on some nonzero vector a and 
W 2 is orthogonal to wj (Figure 3.3.2). That result is a special case of the following more 
general theorem, which we will state without proof. 


Projection Theorem 

If W is a finite-dimensional subspace of an inner product space V, then every vector u 
in V can be expressed in exactly one way as 

u = Wj + W2 (8) 

where Wi is in W and W 2 is in W ± . 


The vectors wi and W 2 in Formula (8) are commonly denoted by 


wi = proj lv u and w 2 = proj lt/ u 


( 9 ) 



These are called the orthogonal projection of u on W and the orthogonal projection of u 
on W x , respectively. The vector W 2 is also called the component of u orthogonal to W . 
Using the notation in (9), Formula (8) can be expressed as 

u = proj w u + pro) w ± u (10) 

(Figure 6.3.1). Moreover, since proj^u = u — proj M , u, we can also express Formula 
(10) as 


▲ Figure 6.3.1 


u = projw u + (u - projn, u) 


(ID 
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Although Formulas (12) and 
(13) are expressed in terms of 
orthogonal and orthonormal 
basis vectors, the resulting vec- 
tor proj w u does not depend on 
the basis vectors that are used. 


A Geometric Interpretation 
of Orthogonal Projections 


The following theorem provides formulas for calculating orthogonal projections. 


!EM 8.3.4 Let W be a finite-dimensional subspace of an inner product space V. 
(a) If{\ i , V 2 , . . . , v r } is an orthogonal basis for W, and u is any vector in V , then 


(u, Vi) (u, v 2 ) (u,v r 

projw u = - — — vi + - — — v 2 H b - — — 


IM 2 


l|v 2 || 2 


(12) 


(b) If{v i , v 2 , . . . , v r } is an orthonormal basis for W , and u is any vector in V, then 

proj w u = (u, vijvi + (u, v 2 )v 2 H b (u, v r )v r (13) 


Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the form 
u = wj + w 2 , where wi = proj^ u is in W and w 2 is in VT X ; and it follows from Theo- 
rem 6.3.2 that the component proj^ u = wi can be expressed in terms of the basis vectors 
for W as 


(wi,vi> (wi,v 2 ) (wj,v r > 

proj^, u = Wi = — — -=-Vi + — — —\2 H b — rx-v r 

llvill 2 ||v 2 || 2 ||v r || 2 


(14) 


Since w 2 is orthogonal to W, it follows that 


(w 2 , vi> = (w 2 , v 2 ) = • • • = (w 2 , v r ) = 0 


so we can rewrite (14) as 

(Wi+W 2 ,Vi) (W[+W 2 ,v 2 ) (wi+w 2 ,v r ) 

projw u = wi = — -r vi -) — — v 2 H 1 — \ r 

IKilr llv 2 |r llv r lr 

or, equivalently, as 

(u, Vi) (u, v 2 ) (u, \f) 

projjy u = wi = - — -pVi + - — ^-v 2 H b - — ry v r 

llvill- ||v 2 ||- V r - 


Proof ( b ) In this case, || Vi || = ||v 2 || = • • • = ||v r || = 1, so Formula (14) simplifies to 
Formula (13). 


^ EXAMPLE 7 Calculating Projections 

Let R 3 have the Euclidean inner product, and let W be the subspace spanned by the 
orthonormal vectors Vi = (0, 1, 0) and v 2 = (— |, 0, |). From Formula (13) the or- 
thogonal projection of u = (1, 1, 1) on W is 

proj w u = (u, vi)vi + (u, v 2 )v 2 

= (1)(0,1,0) + H)H,0, 1) 

= (± l — Tj 

The component of u orthogonal to W is 

proj w ±u = u-proj^u = (1, 1, 1) - (^, 1, -^) = (|,0, |) 

Observe that proj^ u is orthogonal to both vi and v 2 , so this vector is orthogonal to 
each vector in the space W spanned by V! and v 2 , as it should be. ^ 

If W is a one-dimensional subspace of an inner product space V, say spanja}, then 
Formula (12) has only the one term 

<u, a) 

pro ] w u = -ttT tt 3 
l|a|| 

In the special case where F is R 3 with the Euclidean inner product, this is exactly For- 
mula (10) of Section 3.3 for the orthogonal projection of u along a. This suggests that 
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we can think of (12) as the sum of orthogonal projections on “axes” determined by the 
basis vectors for the subspace W (Figure 6.3.2). 



The Gram-Schmidt Process We have seen that orthonormal bases exhibit a variety of useful properties. Our next the- 
orem, which is the main result in this section, shows that every nonzero finite-dimensional 
vector space has an orthonormal basis. The proof of this result is extremely important 
since it provides an algorithm, or method, for converting an arbitrary basis into an 
orthonormal basis. 


Every nonzero finite-dimensional inner product space has an ortho- 
normal basis. 


v 2 = u 2 -P ro V 1 u 2 



v 3 = u v proj^u, 



Proof Let W be any nonzero finite-dimensional subspace of an inner product space, and 
suppose that {uj, u 2 , . . . , u r } is any basis for W. It suffices to show that W has an orthog- 
onal basis since the vectors in that basis can be normalized to obtain an orthonormal 
basis. The following sequence of steps will produce an orthogonal basis {vi, v 2 , . . . , v r } 
for W: 

Step 1. Let Vi = Ui . 

Step 2. As illustrated in Figure 6.3.3, we can obtain a vector v 2 that is orthogonal to Vj 
by computing the component of u 2 that is orthogonal to the space W\ spanned 
by vi. Using Formula (12) to perform this computation, we obtain 

(u 2 , Vi) 

V 2 = u 2 - prow u 2 = u 2 — -=-Vi 

llvill 2 

Of course, if v 2 = 0, then v 2 is not a basis vector. But this cannot happen, since 
it would then follow from the preceding formula for v 2 that 

(u 2 ,Vi) (u 2 ,Vi) 

INI 2 lllfill 2 

which implies that u 2 is a multiple of ui , contradicting the linear independence 
of the basis {ui , u 2 , . . . , u,}. 

Step 3. To construct a vector V 3 that is orthogonal to both vi and v 2 , we compute the 
component of U3 orthogonal to the space W 2 spanned by vi and v 2 (Figure 6.3.4). 
Using Formula (12) to perform this computation, we obtain 

(u 3 ,Vi> (u 3 ,v 2 ) 

v 3 = u 3 - projw U 3 = U 3 - Vi - — — v 2 

INI 2 ||v 2 || 2 

As in Step 2, the linear independence of {ui, u 2 , . . . , u,.} ensures that v 3 0. We 
leave the details for you. 
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Step 4. To determine a vector v 4 that is orthogonal to vi, v 2 , and v 3 , we compute the 
component of u 4 orthogonal to the space VT 3 spanned by Vi , V2 , and v 3 . From (12), 

(u 4 ,vi) (u 4 , v 2 ) (u 4 , v 3 ) 

v 4 = u 4 - prow u 4 = u 4 — rr^-vi — ^-v 2 — ^-v 3 

IKilr ||v 2 r ||v 3 || 2 

Continuing in this way we will produce after r steps an orthogonal set of nonzero 
vectors {v 3 , v 2 , . . . , v r }. Since such sets are linearly independent, we will have produced 
an orthogonal basis for the r -dimensional space W. By normalizing these basis vectors 
we can obtain an orthonormal basis. 

The step-by-step construction of an orthogonal (or orthonormal) basis given in 
the foregoing proof is called the Gram-Schmidt process. For reference, we provide the 
following summary of the steps. 


The Gram-Schmidt Process 

To convert a basis {u 3 , u 2 , . . . , u r } into an orthogonal basis {vi, v 2 v r }, perform 

the following computations: 

Step 1. vi = ui 


Step 2. v 2 = u 2 
Step 3. v 3 = u 3 
Step 4. v 4 = u 4 


(u 2 , vi) 

llvill 2 

(U3. Vl) 

llvill 2 

(u 4 , Vl) 


Vi 


Vl 


Vl 


(u 3 , v 2 ) 
l|v 2 || 2 

(U4. V2> 

l|v 2 || 2 


v 2 


v 2 


(«4. V 3 ) 

I|V 3 || 2 V3 


(continue for r steps) 

Optional Step. To convert the orthogonal basis into an orthonormal basis 
{qj, q 2 , . . . , q,.}, normalize the orthogonal basis vectors. 



Jorgen Pederson Gram 
(1850-1916) 


Erhardt Schmidt (1875-1959) was a German mathematician 
who studied for his doctoral degree at Gottingen University under David 
Hilbert, one of the giants of modern mathematics. For most of his life he taught 
at Berlin University where, in addition to making important contributions to 
many branches of mathematics, he fashioned some of Hilbert's ideas into a 
general concept, called a Hilbert space— a fundamental structure in the study 
of infinite-dimensional vector spaces. He first described the process that bears 
his name in a paper on integral equations that he published in 1907. 

Gram was a Danish actuary whose early education was at vil- 
lage schools supplemented by private tutoring. He obtained a doctorate degree 
in mathematics while working for the Hafnia Life Insurance Company, where 
he specialized in the mathematics of accident insurance. It was in his disser- 
tation that his contributions to the Gram-Schmidt process were formulated. 
He eventually became interested in abstract mathematics and received a gold 
medal from the Royal Danish Society of Sciences and Letters in recognition of 
his work. His lifelong interest in applied mathematics never wavered, however, 
and he produced a variety of treatises on Danish forest management. 

[Image: http://www-history.mcs.st-and.ac.uk/PictDisplay/Gram.html] 
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► EXAMPLE 8 Using the Gram-Schmidt Process 

Assume that the vector space R 3 has the Euclidean inner product. Apply the Gram- 
Schmidt process to transform the basis vectors 

U! = (1,1,1), U 2 = (0,1,1), u 3 = (0,0,1) 


into an orthogonal basis {v t , v 2 , v 3 }, and then normalize the orthogonal basis vectors to 
obtain an orthonormal basis {qj , q 2 , q 3 }. 


Solution 

Step 1. vi = Ui = (1, 1, 1) 

Step 2. v 2 = u 2 — proj Wl u 2 = u 2 


(u 2 , Vi) 


II Vi I 


-Vi 


3 3 3 

(U 3 , V!) 


,211 

= ( 0 , 1 , 1 ) — — ( 1 , 1 , 1 )= ( — - 


Step 3. v 3 = u 3 — proj W2 u 3 = u 3 - 


, 1 1 

= l°--2-2 


(u 3 , v 2 ) 


IIhII 2 

’ 1 

l|v 2 || 

1/3 / 

2 

1 M 

2/3 V 

_ 3’ 

3’ 3 ) 


-V 2 


Thus, 


Vi = (1, 1, 1), v 2 = 


2 1 1 

3’ 3’ 3 


, „ 11 

V3 _ _ 2 ’ 2 


form an orthogonal basis for R 3 . The norms of these vectors are 


II Vi 


= V3, 


l|v 2 || 


\/6 

"T’ 


I|V 3 1 


so an orthonormal basis for R 3 is 

vi _ / 1 1 1 

l|vi|| ~~ VV3’ V3 ’ V3 )' 


qi 


q 2 


V2 

II v 2 || 


1 

Ti 

2 1 1 

\/6 V6 \/6 


v 3 / 11 

q3 = M = 


Remark In the last example we normalized at the end to convert the orthogonal basis into an 
orthonormal basis. Alternatively, we could have normalized each orthogonal basis vector as soon 
as it was obtained, thereby producing an orthonormal basis step by step. However, that procedure 
generally has the disadvantage in hand calculation of producing more square roots to manipulate. 
A more useful variation is to “scale” the orthogonal basis vectors at each step to eliminate some of 
the fractions. For example, after Step 2 above, we could have multiplied by 3 to produce (—2, 1 , 1) 
as the second orthogonal basis vector, thereby simplifying the calculations in Step 3. 


calculus required ► EXAMPLE 9 Legendre Polynomials 

Let the vector space P 2 have the inner product 

(P, q> = J p(x)q(x)dx 

Apply the Gram-Schmidt process to transform the standard basis {1, x, x 2 } for P 2 into 
an orthogonal basis (</>i(x), (piU), <Pi(x)}- 
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Extending Orthonormal 
Sets to Orthonormal Bases 


Solution Take ii| = I . u 2 = x, and U 3 = x 1 . 

Step 1. vi = ui = 1 
Step 2. We have 


so 


Step 3. We have 


so 


(«2, Vl) = J 


= I x dx = 0 


v 2 = u 2 — 


(U2, Vi) 


-Vl = U 2 = X 


(u 3 

(u 3 


= / x 1 dx = — 


• Vl> = 

■*>-£ 


, 1 


— I x dx = — 
4 


2 
3 

= 0 


ll v ill = (vi , vi ) = / 1 dx = x 


/: 


-1 1 


= 2 


v 3 = u 3 


(u 3 , V l) 
llvil 


-Vl 


(u 3 , V 2)_ 

l|v 2 | 


-v 2 = X 


Thus, we have obtained the orthogonal basis {0 i(x), 0 2 (x), 0 3 (x)} in which 

2 1 

01 (x) = 1, 0 2 (x) = X, 0 3 (x) = X" 


Remark The orthogonal basis vectors in the last example are often scaled so all three functions 
have a value of 1 at x = 1 . The resulting polynomials 

1 , 

1, x, -(3x 2 — 1) 

which are known as the first three Legendre polynomials , play an important role in a variety of 
applications. The scaling does not affect the orthogonality. 


Recall from part ( b ) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional 
vector space can be enlarged to a basis by adding appropriate vectors. The following the- 
orem is an analog of that result for orthogonal and orthonormal sets in finite-dimensional 
inner product spaces. 


!EM 6.3.6 IfW is a finite-dimensional inner product space, then : 

(a) Every orthogonal set of nonzero vectors in W can he enlarged to an orthogonal 
basis for W. 

(b) Every orthonormal set in W can be enlarged to an orthonormal basis for W. 


We will prove part ( b ) and leave part (a) as an exercise. 

Proof ( b ) Suppose that S = {v 3 , v 2 , . . . , v s } is an orthonormal set of vectors in W. 
Part ( b ) of Theorem 4.5.5 tells us that we can enlarge S to some basis 
S' = {vi, v 2 , . . . , v s , v J+ i, . . . , v*} 

for W . If we now apply the Gram-Schmidt process to the set S' , then the vectors 
vi , v 2 , . . . , v s , will not be affected since they are already orthonormal, and the resulting 
set 

S" = {\ u \ 2 ,...,y s ,V s+l ,...,V k } 
will be an orthonormal basis for W. 



374 Inner Product Spaces 


O PT I O N A L 

QR-Decomposition 


In recent years a numerical algorithm based on the Gram-Schmidt process, and known 
as QR-de composition, has assumed growing importance as the mathematical foundation 
for a wide variety of numerical algorithms, including those for computing eigenvalues of 
large matrices. The technical aspects of such algorithms are discussed in textbooks that 
specialize in the numerical aspects of linear algebra. However, we will discuss some of 
the underlying ideas here. We begin by posing the following problem. 


Problem If A is an m x n matrix with linearly independent column vectors, and if 
Q is the matrix that results by applying the Gram-Schmidt process to the column 
vectors of A, what relationship, if any, exists between A and Ql 


To solve this problem, suppose that the column vectors of A are m, U 2 , . . . , u„ and 
that Q has orthonormal column vectors q 1; q 2 , . . . , q„. Thus, A and Q can be written 
in partitioned form as 

A = [u, | u 2 | • • ■ | u„] and Q = [q, | q 2 | • • ■ | q„] 

It follows from Theorem 6.3.2(A) that ui, u 2 , . . . , u„ are expressible in terms of the vectors 
qi,q 2 ,...,q„ as 


ui = (m, q^qj + (ui, q 2 )q 2 H b (ui, q„)q„ 

u 2 = (u 2 , q^q! + (u 2 , q 2 )q 2 H b (u 2 , q„)q„ 


u„ = (u„, q^qj + (u„, q 2 )q 2 H b (u„, qjq„ 

Recalling from Section 1.3 (Example 9) that the y'th column vector of a matrix product 
is a linear combination of the column vectors of the first factor with coefficients coming 
from the y'th column of the second factor, it follows that these relationships can be 
expressed in matrix form as 





"(»i-qi> 

(u 2 ,qi> • 

• (u«,qi) 

[ui | U 2 1 - 

■ 1 u n ] = [qj | q 2 | ■ 

• 1 qJ 

(ui.q 2 > 

(« 2 ,q 2 > • 

• ( u « > q 2 ) 




_(ui,q„> 

(u 2 ,qj • 

■ (u„,q„> 


or more briefly as 


A = QR 


(15) 


where R is the second factor in the product. However, it is a property of the Gram- 
Schmidt process that for j > 2, the vector q ; . is orthogonal to m, u 2 , . . . , u 2 _i. Thus, all 
entries below the main diagonal of R are zero, and R has the form 


(ui,qi> (u 2 ,qi) (u^qj) 

0 (u 2 , q 2 ) (u„, q 2 ) 

0 0 ••• (u„,q„) 


(16) 


We leave it for you to show that R is invertible by showing that its diagonal entries 
are nonzero. Thus, Equation (15) is a factorization of A into the product of a matrix Q 
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It is common in numerical 
linear algebra to say that a ma- 
trix with linearly independent 
columns has full column rank. 


with orthonormal column vectors and an invertible upper triangular matrix R . We call 
Equation (15) a QR-decomposition of A. In summary, we have the following theorem. 


QR-Decomposition 

If A is an m x n matrix with linearly independent column vectors, then A can be fac- 
tored as 

A = QR 

where Q is an m x n matrix with orthonormal column vectors, and R is an n x n 
invertible upper triangular matrix. 


Recall from Theorem 5.1.5 (the Equivalence Theorem) that a square matrix has 
linearly independent column vectors if and only if it is invertible. Thus, it follows from 
Theorem 6.3.7 that every invertible matrix has a QR-decomposition. 


► EXAMPLE 10 Q/7-Decomposition of a 3 x 3 Matrix 


Find a QR -decomposition of 



0 0 

1 0 
1 1 


Solution The column vectors of A are 



T 


'o' 


'o' 

111 = 

l 

, U 2 = 

1 

, U 3 = 

0 


l 


1 


1 


Applying the Gram-Schmidt process with normalization to these column vectors yields 
the orthonormal vectors (see Example 8) 



r 1 n 

V3 


r 2-i 

V6 


' 0 " 

1i = 

1 

V3 

, q 2 = 

1 

V6 

, q 3 = 

_ 1 

V2 


1 

-V3 - 


1 

- a/6 - 


1 

L J 


Thus, it follows from Formula (16) that R is 



(m.qi) 

(« 2 ,qi> 

(u 3 ,qi) 


r 3 

V3 

2 

V3 

1 -1 

V3 

R = 

0 

(u 2 ,q 2 > 

(u 3 ,q 2 ) 

= 

0 

2 

V6 

1 

V6 


0 

0 

(U3,q 3 >_ 


0 

0 

1 

V2 - 


from which it follows that a QR-decomposition of A is 






r i 

2 

0 ' 


r 3 

2 

1 -1 

l 

0 

o' 


V3 

y/6 


V3 

V3 

V3 

l 

l 

0 


1 

1 

i 


0 

2 

1 


V3 

V6 

fi 


fb 

V6 

l 

l 

1 


1 

1 

i 


0 

0 

1 





LV3 

\/6 

V 2 J 



V2 - 


A = Q R 
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Exercise Set 6.3 

1. In each part, determine whether the set of vectors is orthog- 
onal and whether it is orthonormal with respect to the Eu- 
clidean inner product on R 2 . 

(a) (0, 1), (2, 0) 

(b) (“3’ 7f)’ (ti 1 i) 

(c) _ 7l)’ (tt 7l) 

(d) (0, 0), (0, 1) 

2. In each part, determine whether the set of vectors is orthog- 
onal and whether it is orthonormal with respect to the Eu- 
clidean inner product on R 3 . 

(a) (71 ,0 ’7l)’ ( _ 7I’ 0 ’7l) 

(b) (§,§,-§). 

(c) (1,0,0), (0, ^), (0,0, 1) 

(d) (vi- ( t2’ _ 7I’ 0 ) 

3. In each part, determine whether the set of vectors is orthog- 
onal with respect to the standard inner product on P 2 (see 
Example 7 of Section 6.1). 

(a) pt(x) = | - | x + fx 2 , p 2 (x) = f + 5* - i* 2 , 

Pi(x) = | + lx + §x 2 

(b) pt(x) = 1, p 2 (x) = ^x + -j^x 2 , p 3 (x) = x 2 

4. In each part, determine whether the set of vectors is orthog- 
onal with respect to the standard inner product on M 22 (see 
Example 6 of Section 6.1). 


(a) 

'1 

0 

O' 

0 

, 

"0 

1 


2“ 

3 

2 

, 

0 

2 


, 

"0 

2 

1 “ 

3 

2 





l_3 


3 J 


L 




L3 

3 J 

(b) 

'1 

O' 


'0 

r 


'0 

O' 


'0 

O' 


0 

0 

’ 

0 

0 

’ 

1 

1 

’ 

1 - 

-1 



In Exercises -6, show that the column vectors of A form an 
orthogonal basis for the column space of A with respect to the 
Euclidean inner product, and then find an orthonormal basis for 
that column space. 



1 

2 

o' 


-1 1 

5 2 

1 - 

3 

5. A = 

0 

0 

5 

II 

so 

1 1 

5 2 

1 

3 


-1 

2 

0 


A 0 

2 






3- 


7. Verify that the vectors 

V! = (-§, | , 0) , v 2 = (f, f,0), v 3 = (0,0, 1) 

form an orthonormal basis for R 3 with respect to the Eu- 
clidean inner product, and then use Theorem 6. 3. 2(6) to ex- 
press the vector u = (1, —2, 2) as a linear combination of Vj, 
v 2 , and v 3 . 


8. Use Theorem 6.3.2 (6) to express the vector u = (3, —7, 4) as 
a linear combination of the vectors Vj , v 2 , and v 3 in Exercise 7. 

9. Verify that the vectors 

vi = (2, -2,1), v 2 = (2, 1, —2), v 3 = (1, 2, 2) 

form an orthogonal basis for R 3 with respect to the Euclidean 
inner product, and then use Theorem 6.3.2(a) to express the 
vector u = (—1, 0, 2) as a linear combination of Vj, v 2 , and v 3 . 

10. Verify that the vectors 

Vi = (1,-1, 2,-1), v 2 = (-2,2, 3,2), 
v 3 = (1,2, 0,-1), v 4 = (1,0,0, 1) 

form an orthogonal basis for R 4 with respect to the Euclidean 
inner product, and then use Theorem 6.3.2(a) to express the 
vector u=(l,l,l,l)asa linear combination of Vi, v 2 , v 3 , 
and V4. 

In Exercises 1 1-14, find the coordinate vector (u)s for the vec- 
tor u and the basis S that were given in the stated exercise. 

11. Exercise 7 12. Exercise 8 

13. Exercise 9 14. Exercise 10 

In Exercises 3-18, let R 2 have the Euclidean inner product. 

(a) Find the orthogonal projection of u onto the line spanned by 
the vector v. 

(b) Find the component of u orthogonal to the line spanned by 
the vector v, and confirm that this component is orthogonal 
to the line. 

15. u = (-1, 6); v = (f, |) 16. u = (2, 3); v = (£, {§) 

17. u = (2, 3); v = (1, 1) 18. u = (3, -1); v = (3, 4) 

In Exercises 19-22, let R 3 have the Euclidean inner product. 

(a) Find the orthogonal projection of u onto the plane spanned 
by the vectors vi and v 2 . 

(b) Find the component of u orthogonal to the plane spanned 
by the vectors V) and v 2> and confirm that this component is 
orthogonal to the plane. 

19. u = (4, 2, 1); V! = (|, f , -§), v 2 = (|, |, §) 

20. u = (3, -1, 2); vj = (^, jj, -jg), v 2 = (^, jj, 

21. u = (1.0, 3); vi = (1, -2, 1), v 2 = (2, 1,0) 

22. u = (1,0, 2); V! = (3, 1, 2), v 2 = (-1, 1, 1) 

In Exercises 23-24, the vectors Vi and v 2 are orthogonal with 
respect to the Euclidean inner product on R 4 . Find the orthogo- 
nal projection of b = (1,2, 0, —2) on the subspace W spanned by 
these vectors. 

23. V! = (1, 1. 1, 1), v 2 = (1, 1,-1, -1) 

24. V! = (0, 1, -4, -1), v 2 = (3, 5, 1, 1) 
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In Exercises 25-26, the vectors Vi, V2, and v 3 are orthonor- 
mal with respect to the Euclidean inner product on R 4 . Find the 
orthogonal projection of b = (1,2, 0, —1) onto the subspace W 
spanned by these vectors. 

25. V! = (0. ^=, -751), v 2 = G> b 

V3 = (tB’ 0, 7n’~Js) 


26 . vi = (5, 5. 5. 5), = 

V 3 = (i,-i,i,-i) 


In Exercises 27-28, let R 2 have the Euclidean inner product 
and use the Gram-Schmidt process to transform the basis {u 3 , u 2 ) 
into an orthonormal basis. Draw both sets of basis vectors in the 
xy-plane. 


27. U! = (1, -3), u 2 = (2, 2) 28. Ul = (1, 0), u 2 = (3, -5) 

In Exercises 29-30, let R 3 have the Euclidean inner product and 
use the Gram-Schmidt process to transform the basis {ui, u 2 , 113} 
into an orthonormal basis. 


38. Verify that the set of vectors { ( 1 , 0) , (0, 1 ) ) is orthogonal with 
respect to the inner product (u, v) = Au\V\ + u 2 v 2 on R 2 \ then 
convert it to an orthonormal set by normalizing the vectors. 

39. Find vectors x and y in R 2 that are orthonormal with respect 
to the inner product (u, v) = 3u\Vi + 2w 2 i> 2 but are not or- 
thonormal with respect to the Euclidean inner product. 

40. In Example 3 of Section 4.9 we found the orthogonal projec- 
tion of the vector x = (1, 5) onto the line through the origin 
making an angle of n /6 radians with the positive x>axis. Solve 
that same problem using Theorem 6.3.4. 

41. This exercise illustrates that the orthogonal projection result- 
ing from Formula (12) in Theorem 6.3.4 does not depend on 
which orthogonal basis vectors are used. 

(a) Let R 3 have the Euclidean inner product, and let W be the 
subspace of R 3 spanned by the orthogonal vectors 

v, = (1,0, 1) and v 2 = (0, 1,0) 

Show that the orthogonal vectors 

vj = (1, 1. 1) and V 2 = (1,-2, 1) 


29. ui = (1, 1. 1), u 2 = (-1,1,0), u 3 = (1,2,1) 

30. Ul = (1, 0. 0), u 2 = (3, 7. -2), 113 = (0. 4, 1) 

31. Let R 4 have the Euclidean inner product. Use the Gram- 
Schmidt process to transform the basis {uj , u 2 , u 3 , u 4 j into an 
orthonormal basis. 

ui = (0,2, 1,0), u 2 = (1,-1, 0,0), 

u 3 = (1,2, 0,-1), U4 = (1,0,0, 1) 


span the same subspace W. 

(b) Let u = (—3, 1, 7) and show that the same vector proj^u 
results regardless of which of the bases in part (a) is used 
for its computation. 

42. ( Calculus required) Use Theorem 6.3.2(a) to express the fol- 
lowing polynomials as linear combinations of the first three 
Legendre polynomials (see the Remark following Example 9). 

(a) 1 + x + 4x 2 (b) 2 — lx 2 (c) 4 + 3x 


32. Let R 3 have the Euclidean inner product. Find an orthonor- 
mal basis for the subspace spanned by (0, 1, 2), (—1. 0, 1), 
(-1,1,3). 

33. Let b and W be as in Exercise 23. Find vectors W| in W and 
w 2 in W L such that b = wi + w 2 . 


43. ( Calculus required) Let P 2 have the inner product 


<P,q)= / p(x)q(x)dx 
Jo 

Apply the Gram-Schmidt process to transform the standard 
basis S = {1, x, x 2 } into an orthonormal basis. 


34. Let b and W be as in Exercise 25. Find vectors Wi in W and 

w 2 in W L such that b = + w 2 . 

35. Let R 3 have the Euclidean inner product. The subspace of 
R 3 spanned by the vectors Ui = (1. 1, 1) and u 2 = (2, 0, — 1) 
is a plane passing through the origin. Express w = (1, 2, 3) 
in the form w = wi + w 2 , where wi lies in the plane and w 2 is 
perpendicular to the plane. 

36. Let R 4 have the Euclidean inner product. Express the vector 
w = (— 1 . 2, 6, 0) in the form w = wi + w 2 , where wi is in the 
space W spanned by uj = (—1, 0, 1, 2) and u 2 = (0, 1. 0, 1), 
and w 2 is orthogonal to W. 

37. Let R 3 have the inner product 

(U, V) = U\V\ + 2 u 2 v 2 + 3m 3 1>3 

Use the Gram-Schmidt process to transform Ui = (1. 1, 1), 
u 2 = (1,1. 0), u 3 = (1. 0, 0) into an orthonormal basis. 


44. Find an orthogonal basis for the column space of the matrix 

1 -5- 

1 1 

-2 5 

8 — 7_ 

In Exercises 45-48, we obtained the column vectors of Q by 
applying the Gram-Schmidt process to the column vectors of A. 
Find a (^-decomposition 0 f the matrix A. 


45. A = 

'1 -f 

2 3 

, Q = 

r 1 

V5 

2 

2 “1 

V5 

1 


r 

L 


LV5 

VsJ 

i_n 



'1 

2" 


1- 1 

V2 

1 -1 

V3 

46. A = 

0 

1 

, Q = 

0 

1 

V3 

1 


_1 

4_ 


1 





Lvz 

V3-I 
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'1 

0 

2" 


72 

75 

V6 


II 

r-4 

0 

1 

1 

. Q = 

0 

1 

73 

2 

V6 



_1 

2 

0_ 


1 

1 

1 







L 72 

73 

-s/6-l 







- 1 

72 

3 

-1 


'1 

2 

r 


72 

2719 

719 

II 

OO 

1 

1 

1 

. Q = 

1 

72 

72 

2719 

3 

719 


_0 

3 

i_ 


0 

372 

1 







719 

719 J 

49. Find a (^-decomposition 0 f the 

matrix 








' 1 

0 r 





-1 1 1 _ 

50. In the Remark following Example 8 we discussed two alter- 
native ways to perform the calculations in the Gram-Schmidt 
process: normalizing each orthogonal basis vector as soon as 
it is calculated and scaling the orthogonal basis vectors at each 
step to eliminate fractions. Try these methods in Example 8 . 

Working with Proofs 

51. Prove part (a) of Theorem 6 . 3 . 6 . 

52. In Step 3 of the proof of Theorem 6 . 3 . 5 , it was stated that “the 
linear independence of {ui, U 2 , . . . , u„} ensures that V 3 7 ^ 0 .” 
Prove this statement. 

53. Prove that the diagonal entries of R in Formula ( 16 ) are 
nonzero. 

54. Show that matrix Q in Example 10 has the property 

QQ T = / 3 , and prove that every m x n matrix Q with or- 
thonormal column vectors has the property QQ T = I m . 

55. (a) Prove that if VP is a subspace of a finite-dimensional vec- 

tor space V, then the mapping T : V — > W defined by 
T (v) = projn/V is a linear transformation. 

(b) What are the range and kernel of the transformation in 
part (a)? 


True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 
false, and justify your answer. 

(a) Every linearly independent set of vectors in an inner product 
space is orthogonal. 

(b) Every orthogonal set of vectors in an inner product space is 
linearly independent. 

(c) Every nontrivial subspace of R 3 has an orthonormal basis with 
respect to the Euclidean inner product. 

(d) Every nonzero finite-dimensional inner product space has an 
orthonormal basis. 

(e) proju, x is orthogonal to every vector of W. 

(f ) If A is an n x n matrix with a nonzero determinant, then A 
has a ^R-decomposition. 

Working with Technology 

Tl. (a) Use the Gram-Schmidt process to find an orthonormal 
basis relative to the Euclidean inner product for the column 
space of 

"1 1 1 r 

10 0 1 

A = 

0 10 2 

. 2-1 1 1 . 

(b) Use the method of Example 9 to find a (^-decomposition 
of A. 

T2. Let P 4 have the evaluation inner product at the points 
—2, —1,0, 1,2. Find an orthogonal basis for P 4 relative to this 
inner product by applying the Gram-Schmidt process to the vec- 
tors 


Po = l> Pi=*. Pi= x 2 > P) = x\ p 4 = * 4 


6.4 Best Approximation; Least Squares 

There are many applications in which some linear system Ax = b of m equations in n 
unknowns should be consistent on physical grounds but fails to be so because of 
measurement errors in the entries of A or b. In such cases one looks for vectors that come 
as close as possible to being solutions in the sense that they minimize ||b — Ax|| with respect 
to the Euclidean inner product on R' n . In this section we will discuss methods for finding 
such minimizing vectors. 


Least Squares Solutions of Suppose that Ax = b is an inconsistent linear system of m equations in n unknowns in 
Linear Systems which we suspect the inconsistency to be caused by errors in the entries of A or b. Since 
no exact solution is possible, we will look for a vector x that comes as “close as possible’' 
to being a solution in the sense that it minimizes ||b — Ax|| with respect to the Euclidean 
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inner product on R m . You can think of Ax as an approximation to b and ||b — Ax|| 
as the error in that approximation — the smaller the error, the better the approximation. 
This leads to the following problem. 


If a linear system is consistent, 
then its exact solutions are the 
same as its least squares solu- 
tions, in which case the least 
squares error is zero. 


Least Squares Problem Given a linear system Ax = b of m equations in n un- 
knowns, find a vector x in R n that minimizes ||b — Ax|| with respect to the Euclidean 
inner product on R'" . We call such a vector, if it exists, a least squares solution of 
Ax = b, we call b — Ax the least squares error vector, and we call ||b — Ax|| the least 
squares error. 


To explain the terminology in this problem, suppose that the column form of b — Ax is 


b — Ax = 


e\ 

ei 


The term “least squares solution” results from the fact that minimizing ||b — Ax|| also 
has the effect of minimizing ||b — Ax|| 2 = e\ + e\ H + ef n . 

What is important to keep in mind about the least squares problem is that for ev- 
ery vector x in R n , the product Ax is in the column space of A because it is a linear 
combination of the column vectors of A. That being the case, to find a least squares 
solution of Ax = b is equivalent to finding a vector Ax in the column space of A that 
is closest to b in the sense that it minimizes the length of the vector b — Ax. This is 
illustrated in Figure 6.4.1a, which also suggests that Ax is the orthogonal projection of 
b on the column space of A, that is, Ax = proj col(/4) b (Figure 6.4.16). The next theorem 
will confirm this conjecture. 




Best Approximation Theorem 

If W is a finite-dimensional subspace of an inner product space V, and if b is a 
vector in V, then proj H , b is the best approximation to b from W in the sense that 

|| b — proj^ b|| < ||b — w|| 

for every vector w in W that is different from proj w b. 


Proof For every vector w in W , we can write 


b - w = (b - proj w b) + (projjy b - w) 


( 1 ) 
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But proj w b — w, being a difference of vectors in W, is itself in W ; and since b — proj w b 
is orthogonal to W, the two terms on the right side of (1) are orthogonal. Thus, it follows 
from the Theorem of Pythagoras (Theorem 6.2.3) that 

II b - w|| 2 = ||b - projw b|| 2 + Hproj^ b - w|| 2 

If w / projjy b, it follows that the second term in this sum is positive, and hence that 

|| b — projjy b|| 2 < || b — w|| 2 

Since norms are nonnegative, it follows (from a property of inequalities) that 

||b — proj w b|| < || b — w|| 


It follows from Theorem 6.4.1 that if V — R n and W = col(A), then the best ap- 
proximation to b from col(A) is proj col(A) b. But every vector in the column space of A is 
expressible in the form Ax for some vector x, so there is at least one vector x in col( A) for 
which Ax = proj col(A) b. Each such vector is a least squares solution of Ax = b. Note, 
however, that although there may be more than one least squares solution of Ax = b, 
each such solution x has the same error vector b — Ax. 


One way to find a least squares solution of Ax = b is to calculate the orthogonal projec- 
tion proj^ b on the column space W of A and then solve the equation 

Ax = proj^ b (2) 

However, we can avoid calculating the projection by rewriting (2) as 


b — Ax = b — proj w b 

and then multiplying both sides of this equation by A r to obtain 

A r (b - Ax) = A r (b - proj w b) 


(3) 


Since b — proj w b is the component of b that is orthogonal to the column space of A, 
it follows from Theorem 4.8.7 (b) that this vector lies in the null space of A 7 , and hence 
that 

A T ( b - projjy b) = 0 


Thus, (3) simplifies to 


A r (b- Ax) = 0 


which we can rewrite as 

A t Ax = A 7 b 


(4) 


This is called the normal equation or the normal system associated with Ax = b. When 
viewed as a linear system, the individual equations are called the normal equations asso- 
ciated with Ax = b. 

In summary, we have established the following result. 
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REM 6.4.2 For every linear system Ax = b, the associated normal system 

A t Ax = A r b (5) 

is consistent, and all solutions of (5) are least squares solutions of Ax = b. Moreover, 
ifW is the column space of A, and x is any least squares solution of Ax = b, then the 
orthogonal projection of b on W is 

P r °j »b = Ax (6) 


► EXAMPLE 1 Unique Least Squares Solution 

Find the least squares solution, the least squares error vector, and the least squares error 
of the linear system 

X\ — xi = 4 
3xi + 2x2 = 1 
— 2x\ 4x2 = 3 


Solution It will be convenient to express the system in the matrix form Ax = b, where 



1 

-l" 


~4~ 

A = 

3 

2 

and b = 

1 


-2 

4 


3 


It follows that 


A t A = 




' 14 -3' 
-3 21 


( 8 ) 


A r b = 




so the normal system A T Ax = A r b is 


1 ' 

10 


' 14 

-3' 

'xf 


' r 

.-3 

21. 

X 2 _ 


.10. 


Solving this system yields a unique least squares solution, namely, 

r — 12 r M3 

X! — 95 , - 285 


The least squares error vector is 


4 

1 


1 -l" 

3 2 

-2 4 


17 ~ 

95 


"4" 

1 


92 

285 

439 


1232 

285 

154 

3 



143 

285 


3 


285 

95 
57 _ 


285 

4 

3 


and the least squares error is 


b- Axil 4.556 


The computations in the next example are a little tedious for hand computation, so 
in absence of a calculating utility you may want to just read through it for its ideas and 
logical flow. 
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► EXAMPLE 2 Infinitely Many Least Squares Solutions 

Find the least squares solutions, the least squares error vector, and the least squares error 
of the linear system 

3 xi + 2x2 — X3 = 2 

Xi — 4x2 + 3x3 = —2 
x\ + 10x 2 — 7 x 3 = 1 

Solution The matrix form of the system is Ax = b, where 


"3 

2 

-f 


2 

1 

-4 

3 

and b = 

-2 

1 

10 

— 7_ 


1 


It follows that 



' 11 

12 

-7' 


5" 

A t A = 

12 

120 

-84 

and A r b = 

22 


-7 

-84 

59 


-15 


so the augmented matrix for the normal system A r Ax = A J b is 


" 11 

12 

-7 

5" 

12 

120 

-84 

22 

-7 

-84 

59 

-15 


The reduced row echelon form of this matrix is 


"1 

0 

1 

7 

2“ 

7 

0 

1 

5 

13 

7 

84 

_0 

0 

0 

0 


from which it follows that there are infinitely many least squares solutions, and that they 
are given by the parametric equations 

v , _ 2 _ 1 1 

V — 11 -L It 
x 2 — 84 + yl 

X 3 = t 

As a check, let us verify that all least squares solutions produce the same least squares 
error vector and the same least squares error. To see that this is so, we first compute 



2 


'3 

2 

-f 


VI- i t l 
1 1 L 


2 


- 7“ 

6 


” 5“ 

6 

b — Ax = 

-2 

- 

1 

-4 

3 


84 ' 7 f 

= 

-2 

- 

1 

3 

= 

5 

3 


1 


1 

10 

-7 


t 


1 


11 

6 


5 

6_ 


Since b — Ax does not depend on f, all least squares solutions produce the same error 
vector, namely 

||b- Ax* = v/(§) 2 + (-S) 2 + (-l) 2 = 1V6 ■« 


We know from Theorem 6.4.2 that the system A T Ax = A T b of normal equations that is 
associated with the system Ax = b is consistent. Thus, it follows from Theorem 1.6.1 
that every linear system Ax = b has either one least squares solution (as in Example l)or 
infinitely many least squares solutions (as in Example 2). Since A r A is a square matrix, 
the former occurs if A T A is invertible and the latter if it is not. The next two theorems 
are concerned with this idea. 
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THEOREM 6.4.3 If A is an m x n matrix , then the following are equivalent. 

(a) The column vectors of A are linearly independent. 

(, b ) A t A is invertible. 

Proof We will prove that (a) ( b ) and leave the proof that ( b ) =>■ (a) as an exercise. 

(a) =>- (f>) Assume that the column vectors of A are linearly independent. The matrix 
A t A has size n x n , so we can prove that this matrix is invertible by showing that the linear 
system A T Ax — 0 has only the trivial solution. But if x is any solution of this system, then 
Ax is in the null space of A T and also in the column space of A. By Theorem 4.8.7 (b) 
these spaces are orthogonal complements, so part ( b ) of Theorem 6.2.4 implies that 
Ax = 0. But A is assumed to have linearly independent column vectors, so x = 0 by 
Theorem 1.3.1. 

The next theorem, which follows directly from Theorems 6.4.2 and 6.4.3, gives an 
explicit formula for the least squares solution of a linear system in which the coefficient 
matrix has linearly independent column vectors. 


If A is an m x n matrix with linearly independent column vectors, 
then for every m x 1 matrix b, the linear system Ax = b has a unique least squares 
solution. This solution is given by 

x = (A T A)~ l A T b (9) 

Moreover, ifW is the column space of A, then the orthogonal projection of b on W is 
proj w b = Ax = A(A T A)~ l A T b (10) 


► EXAMPLE 3 A Formula Solution to Example 1 

Use Formula (9) and the matrices in Formulas (7) and (8) to find the least squares solution 
of the linear system in Example 1 . 


Solution We leave it for you to verify that 
x= (A t A)~ x A t b = 


’ 14 

-3’ 

-1 

1 

3 

-2 

-3 

21 


-1 

2 

4 


1 

285 


21 3 

3 14 


1 

"4" 

- 


1 

= 

- 

3 

_ 


17 

95 


143 

285 


which agrees with the result obtained in Example 1 . ^ 


4 

1 

3 


It follows from Formula (10) that the standard matrix for the orthogonal projection 
on the column space of a matrix A is 

P = A(A 7 A)" 1 A r (11) 

We will use this result in the next example. 

► EXAMPLE 4 Orthogonal Projection on a Column Space 

We showed in Formula (4) of Section 4.9 that the standard matrix for the orthogonal 
projection onto the line W through the origin of R 2 that makes an angle 6 with the 
positive v-axis is 
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y 


W 



A Figure 6.4.2 


More on the Equivalence 
Theorem 


Pe = 


cos 2 0 
sin 0 cos 6 


sin 9 cos 0 
sin 2 6 


Derive this result using Formula (11). 


Solution To apply Formula (11) we must find a matrix A for which the line W is the 
column space. Since the line is one-dimensional and consists of all scalar multiples of 
the vector w = (cost), sin 0) (see Figure 6.4.2), we can take A to be 


A = 


cos 6 
sin0 


Since A T A is the lxl identity matrix (verify), it follows that 


A(A T A)~ l A T = AA t 


cos 9 
sin0 


[cos0 


sin0] 


cos 2 6 
sin 0 cos 6 


sin0 cos# 
sin 2 0 


= Pe ◄ 


As our final result in the main part of this section we will add one additional part to 
Theorem 5.1.5. 


Equivalent Statements 

If A is an n x n matrix, then the following statements are equivalent. 

(a) A is invertible. 

(b) Ax = 0 has only the trivial solution. 

(c) The reduced row echelon form of A is /„. 

( d ) A is expressible as a product of elementary matrices. 

(e) Ax = b is consistent for every n x 1 matrix b. 

(/) Ax = b has exactly one solution for every n x 1 matrix b. 

(g) det(A)^0. 

(/;) The column vectors of A are linearly independent. 

( i ) The row vectors of A are linearly independent. 

( j ) The column vectors of A span R". 

( k ) The row vectors of A span R". 

(/) The column vectors of A form a basis for R". 

(m) The row vectors of A form a basis for R n . 

(n) A has rank n. 

( o ) A has nullity 0. 

(p) The orthogonal complement of the null space of A is R n . 

(q) The orthogonal complement of the row space of A is {0}. 

(r) The kernel ofT A is {0}. 

0) The range ofT A is R n . 

(?) T a is one-to-one. 

(u) X = 0 is not an eigenvalue of A. 

(v) A t A is invertible. 
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O PT I O NA L 

Another View of Least 
Squares 


o PT i o NA L 

The Role of 
QR-Decomposition in Least 
Squares Problems 


The proof of part (v) follows from part (h) of this theorem and Theorem 6.4.3 applied 
to square matrices. 


Recall from Theorem 4.8.7 that the null space and row space of an m x n matrix A are 
orthogonal complements, as are the null space of A T and the column space of A. Thus, 
given a linear system Ax = b in which A is an m x n matrix, the Projection Theorem 
(6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal 
terms as 

x = Xrow(A) + x lui ]|( 4 j and b — b nu {i(^7y T b co i(^) 
where x row( A) and x nu ii(^) are the orthogonal projections of x on the row space of A and 
the null space of A, and the vectors b nu n (A 7-) and b co i ( A) are the orthogonal projections 
of b on the null space of A T and the column space of A. 

In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular 
lines in R" and R' n on which we indicated the orthogonal projections of x and b. (This, 
of course, is only pictorial since the fundamental spaces need not be one-dimensional.) 
The figure shows Ax as a point in the column space of A and conveys that b co i ( A) is the 
point in col(A) that is closest to b. This illustrates that the least squares solutions of 
Ax = b are the exact solutions of the equation Ax = b co i(^) - 


► Figure 6.4.3 


null(A) 


ViUM) 



— . row(A) 

x row(/l | 


col(A) 


nullf-A 7 ) 



Formulas (9) and (10) have theoretical use but are not well suited for numerical com- 
putation. In practice, least squares solutions of Ax = b are typically found by using 
some variation of Gaussian elimination to solve the normal equations or by using QR- 
decomposition and the following theorem. 


3 If A is an m x n matrix with linearly independent column vectors , and 
if A = QR is a QR-decomposition of A (see Theorem 6.3.7), then for each b in R' n the 
system Ax = b has a unique least squares solution given by 

x=R~ l Q T b (12) 


A proof of this theorem and a discussion of its use can be found in many books on 
numerical methods of linear algebra. However, you can obtain Formula (12) by making 
the substitution A = QR in (9) and using the fact that Q T Q = I to obtain 

x= {(QR) T (QR))~ l (QR) T b 
= (R T Q T QR)~ l (QR) T b 

= R-\R t )~ x R t Q t b 
= R x Q T b 
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Exercise Set 6.4 


In Exercises 1-2, find the associated normal equation. 


1 -f 




2 

2 3 


'x\ 

— 

-1 

4 5 


X'2_ 


5 


-1 

0 

1 

2 


2 . 


2-1 0 
3 1 2 

-1 4 5 

1 2 4 


In Exercises 3-6, find the least squares solution of the equation 
Ax = b. 



"l -l" 


2" 

II 

2 3 

; b 

-1 


4 5 


5 


"2 -2 


2 

4. A = 

1 1 

, b = 

-1 


3 1 


1 



"l 

0 

-l" 


"6" 


2 

1 

-2 


0 

5. A = 

1 

1 

0 

; b = 

9 


1 

1 

-1 


3 


'2 

0 

-l" 


"0" 


1 

-2 

2 


6 

II 

VO 

2 

-1 

0 

; b = 

0 


0 

1 

-1 


6 


In Exercises " 10, find the least squares error vector and least 
squares error of the stated equation. Verify that the least squares 
error vector is orthogonal to the column space of A. 

7. The equation in Exercise 3. 



'-1 

3 

2" 


7' 

II 

cn 

2 

1 

3 

; b = 

0 


0 

1 

1 


-7 





J 

L 

j 


'3 

2 

-l" 


2" 

14. A = 

1 

-4 

3 

; b = 

-2 


1 

10 

-7 


1 


In Exercises 15-16, use Theorem 6.4.2 to find the orthogonal 
projection of b on the column space of A, and check your result 
using Theorem 6.4.4. 



16. A = 



-4 

2 

3 


17. Find the orthogonal projection of u on the subspace of R 3 
spanned by the vectors Vi and V2. 

u = (1, —6, 1); vi = (-1,2,1), V2 = (2, 2, 4) 


18. Find the orthogonal projection of u on the subspace of R 4 
spanned by the vectors Vi, V2, and v 3 . 

u = (6, 3, 9, 6); Vi = (2, 1, 1, 1), v 2 = (1, 0, 1. 1), 
v 3 = (-2, -1, 0, -1) 


In Exercises 19-20, use the method of Example 3 to find the 
standard matrix for the orthogonal projection on the stated sub- 
space of R 2 . Compare your result to that in Table 3 of Section 
4.9. 


19. the x-axis 20. the y-axis 

In Exercises 21-22, use the method of Example 3 to find the 
standard matrix for the orthogonal projection on the stated sub- 
space of R 3 . Compare your result to that in Table 4 of Section 
4.9. 


8. The equation in Exercise 4. 

9. The equation in Exercise 5. 

10. The equation in Exercise 6. 

In Exercises 11 14, find parametric equations for all least 
squares solutions of Ax = b, and confirm that all of the solutions 
have the same error vector. 



2 l" 


'3' 

11. A = 

4 2 

; b = 

2 


-2 -1 


1 


1 3' 


"f 

II 

-'C 

rl 

-2 -6 

; b = 

0 


3 9 


1 


21. the xz-plane 22. the yz-plane 

In Exercises 23-24, a gfCfactorization of A is given. Use it to 
find the least squares solution of Ax = b. 


3 l" 


3 4" 

5 5 

■5 -r 

; b = 

"3" 

-4 1 


4 3 

5 5 

_ 

_° L 


2 

'3 -6' 


n 01 




"-1 





'5 -10' 



OO 

1 

— 

\ 0 



; b = 

7 



5 


0 1 



0 1 


° 1 




2 


25. Let W be the plane with equation 5x — 3y + z = 0. 

(a) Find a basis for W . 

(b) Find the standard matrix for the orthogonal projection 
onto W. 


26. Let W be the line with parametric equations 

x = 2/, y = — /, z = 4/ 

(a) Find a basis for W. 

(b) Find the standard matrix for the orthogonal projection 
on W. 

27. Find the orthogonal projection of u = (5, 6, 7, 2) on the so- 
lution space of the homogeneous linear system 

X\ -f X 2 4“ X 3 =0 

2x 2 4“ A" 3 A 4 = 0 

28. Show that if w = (a, b,c) is a nonzero vector, then the stan- 
dard matrix for the orthogonal projection of R 3 onto the line 
span(w) is 

a 2 ab ac 

ab b 2 be 

ac be c 2 _ 

29. Let A be an m x n matrix with linearly independent row vec- 
tors. Find a standard matrix for the orthogonal projection of 
R" onto the row space of A. 

Working with Proofs 

30. Prove: If A has linearly independent column vectors, and if 
Ax = b is consistent, then the least squares solution of Ax = b 
and the exact solution of Ax = b are the same. 

31. Prove: If A has linearly independent column vectors, and if b 
is orthogonal to the column space of A, then the least squares 
solution of Ax = b is x = 0. 

32. Prove the implication (b) =y (a) of Theorem 6.4.3. 
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True-False Exercises 

TF. In parts (a)-(h) determine whether the statement is true or 

false, and justify your answer. 

(a) If A is an m x n matrix, then A T A is a square matrix. 

(b) If A T A is invertible, then A is invertible. 

(c) If A is invertible, then A T A is invertible. 

(d) If Ax = b is a consistent linear system, then 
A T Ax = A r b is also consistent. 

(e) If Ax = b is an inconsistent linear system, then 
A T Ax = A r b is also inconsistent. 

(f ) Every linear system has a least squares solution. 

(g) Every linear system has a unique least squares solution. 

(h) If A is an m x n matrix with linearly independent columns and 
b is in R m , then Ax = b has a unique least squares solution. 

Working with Technology 

Tl. (a) Use Theorem 6.4.4 to show that the following linear sys- 
tem has a unique least squares solution, and use the method 
of Example 1 to find it. 

x, 4“ x 2 4~ x 2 = 1 
4xi 4- 2 jc 2 4- *3 = 10 
9xi 4- 3x2 4- X3 = 9 
16xi 4- 4 x 2 4- X 3 = 16 

(b) Check your result in part (a) using Formula (9). 

T2. Use your technology utility to perform the computations and 

confirm the results obtained in Example 2. 


P = 


1 


a 2 + b 2 + c 2 


6.5 Mathematical Modeling Using Least Squares 

In this section we will use results about orthogonal projections in inner product spaces to 
obtain a technique for fitting a line or other polynomial curve to a set of experimentally 
determined points in the plane. 


Fitting a Curve to Data A common problem in experimental work is to obtain a mathematical relationship 

y = /(x) between two variables x and y by “fitting” a curve to points in the plane 
corresponding to various experimentally determined values of x and y, say 

(xi, yi), (x 2 , y 2 ), ■ ■ • , (x n , y«) 

On the basis of theoretical considerations or simply by observing the pattern of the 
points, the experimenter decides on the general form of the curve y = /(x) to be fitted. 
This curve is called a mathematical model of the data. Some examples are (Figure 6.5.1): 
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► Figure 6.5.1 

Least Squares Fit of a 
Straight Line 


(a) A straight line: y — a+bx 

(b) A quadratic polynomial: y = a + bx + cx 2 

(c) A cubic polynomial: y — a + bx + cx 2 + dx 3 





When data points are obtained experimentally, there is generally some measurement 
“error,” making it impossible to find a curve of the desired form that passes through all 
the points. Thus, the idea is to choose the curve (by determining its coefficients) that 
“best fits” the data. We begin with the simplest case: fitting a straight line to data points. 

Suppose we want to fit a straight line y = a + bx to the experimentally determined 
points 

(xi, yi), (x 2 , y 2 ), ■ • ■ , (x„, y„) 

If the data points were collinear, the line would pass through all n points, and the 
unknown coefficients a and b would satisfy the equations 


y\ = a + bx i 
y 2 = a + bx 2 

y n = a + bx„ 

We can write this system in matrix form as 


( 1 ) 


1 X| 



~yi 

1 X2 


a 


y2 



b_ 



_ 1 x n _ 



_y n _ 


or more compactly as 


where 


Mv = y 



~yi~ 


'1 

X] 

y = 

yi 

, M = 

1 

X 2 


_yn_ 


_1 

%n_ 


( 2 ) 


( 3 ) 


If there are measurement errors in the data, then the data points will typically not lie 
on a line, and (1) will be inconsistent. In this case we look for a least squares approxi- 
mation to the values of a and b by solving the normal system 


M t M\ = M t y 


For simplicity, let us assume that the x -coordinates of the data points are not all the same, 
so M has linearly independent column vectors (Exericse 14) and the normal system has 
the unique solution 
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= (M T M)- ] M T y 

[see Formula (9) of Theorem 6.4.4]. The line y — a* + b*x that results from this solution 
is called the least squares line of best fit or the regression line. It follows from (2) and (3) 
that this line minimizes 

II y - M\\\ 2 = [>’| - (a + bxi )] 2 + [y 2 - (a + bx 2 )] 2 H F [y„ - (a + bx„)] 2 

The quantities 



d\ = |ji - (a + bxi)\, d 2 = \y 2 - (a + bx 2 )\, . . . , d n = \y„ - (a + bx n )\ 

are called residuals. Since the residual d, is the distance between the data point (x,-, y) 
and the regression line (Figure 6.5.2), we can interpret its value as the “error” in y at 
the point x,- . If we assume that the value of each X; is exact, then all the errors are in the 
y, so the regression line can be described as the line that minimizes the sum of the squares 
of the data errors — hence the name, “least squares line of best fit.” In summary, we have 
the following theorem. 


► Figure 6.5.2 d t measures the 
vertical error. 



Uniqueness of the Least Squares Solution 

Let (xi, yi), (X 2 , y 2 ), • • • , (x„, y„) be a set of two or more data points, not all lying on 
a vertical line, and let 


(4) 



"1 

Xi 


".Vl 

M = 

1 

x 2 

and y = 

V2 


_1 

%n_ 


_yn_ 


Then there is a unique least squares straight line fit 

y = a* + b*x 

to the data points. Moreover, 

v* = 

is given by the formula 


v* = (M T M)-'M T y 


( 5 ) 

(6) 

( 7 ) 


which expresses the fact that v = v* is the unique solution of the normal equation 

M t M\ = M t y (8) 
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X 


▲ Figure 6.5.3 



▲ Figure 6.5.4 


► EXAMPLE 1 Least Squares Straight Line Fit 

Find the least squares straight line fit to the four points (0, 1), (1, 3), (2, 4), and (3, 4). 
(See Figure 6.5.3.) 


Solution We have 


M = 


1 

1 

1 

1 


0 

1 

2 

3 


M t M = 


v* = (M T My 1 M T y 


I 

To 


'4 6' 

, and ( M t M ) 1 = — 

7 -3' 

6 14 

10 

-3 2_ 




"f 



7 -3' 

Till' 


3 


T.5" 

3 2_ 

0 12 3. 


4 


1 _ 



4 




so the desired line is y = 1.5 + x. 


► EXAMPLE 2 Spring Constant 

Flooke’s law in physics states that the length x of a uniform spring is a linear function of 
the force y applied to it. If we express this relationship as y = a + bx, then the coefficient 
b is called the spring constant. Suppose a particular unstretched spring has a measured 
length of 6.1 inches (i.e., x — 6.1 when y = 0). Suppose further that, as illustrated in 
Figure 6.5.4, various weights are attached to the end of the spring and the following table 
of resulting spring lengths is recorded. Find the least squares straight line fit to the data 
and use it to approximate the spring constant. 


Weight y (lb) 

0 

2 

4 

6 

Length x (in) 

6.1 

7.6 

8.7 

10.4 


Solution The mathematical problem is to fit a line y = a + bx to the four data points 
(6.1,0), (7.6,2), (8.7,4), (10.4,6) 


For these data the matrices M and y in (4) are 


so 


"1 

6.1" 


~o~ 

1 

7.6 


2 

1 

8.7 

, y = 

4 

1 

10.4 


6 






where the numerical values have been rounded to one decimal place. Thus, the estimated 
value of the spring constant is b* ^ 1 .4 pounds/inch. 


Least Squares Fit of a 
Polynomial 


The technique described for fitting a straight line to data points can be generalized to 
fitting a polynomial of specified degree to data points. Let us attempt to fit a polynomial 
of fixed degree m 

y — ao + a \x + ■ ■ ■ + a m x m (9) 


to n points 


(xi, yi), (x 2 , y 2 ), ■ • • , (x n , y n ) 
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Substituting these n values of x and y into (9) yields the n equations 


yi = a 0 + a\X\ H b a m x™ 

yi = flo + a\x 2 H b a m x 2 


y n = a 0 + a x x„ H b a m x 


or in matrix form, 


y = Mx 


(10) 


where 



>’i 


1 Xi X 2 

y.m~ 

• x x 


do 

y = 

yi 

, M = 

1 X2 x\ 

x 2 

i v = 

d\ 




_1 X„ x 2 ■ 

Y m 
A n _ 


dm 


As before, the solutions of the normal equations 

M t Mx = M t y 

determine the coefficients of the polynomial, and the vector v minimizes 

||y — Afv|| 

Conditions that guarantee the invertibility of M T M are discussed in the exercises (Exer- 
cise 16). If M t M is invertible, then the normal equations have a unique solution v = v*, 
which is given by 

v* = (M T M)~ l M T y (12) 


► EXAMPLE 3 Fitting a Quadratic Curve to Data 

According to Newton’s second law of motion, a body near the Earth’s surface falls 
vertically downward in accordance with the equation 

s = s 0 + v 0 t + \gt 2 (13) 


where 


s = vertical displacement downward relative to some reference point 
so = displacement from the reference point at time t = 0 
u 0 = velocity at time t — 0 
g = acceleration of gravity at the Earth’s surface 

Suppose that a laboratory experiment is performed to approximate g by measuring the 
displacement s relative to a fixed reference point of a falling weight at various times. Use 
the experimental results shown in the following table to approximate g. 


Time t (sec) 

,i 

.2 

.3 

.4 

.5 

Displacement s (ft) 

-0.18 

0.31 

1.03 

2.48 

3.73 
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Solution For notational simplicity, let a 0 = Sq , a\ — v 0 , and a 2 = \g in (13), so our 
mathematical problem is to fit a quadratic curve 

s — ao + ait + a 2 t 2 (14) 

to the five data points: 

(.1, —0.18), (.2,0.31), (.3,1.03), (.4,2.48), (.5,3.73) 



0 .1 .2 .3 .4 .5 .6 


Time t (in seconds) 

A Figure 6.5.5 


With the appropriate adjustments in notation, the matrices M and y in (1 1 ) are 


"l 

h 

tf 


■1 

,i 

.or 


~jf 


■—0.18' 

1 

h 

f 2 2 


i 

.2 

.04 


J2 


0.31 

1 

h 

f 3 2 

= 

i 

.3 

.09 

, y = 

S3 

= 

1.03 

1 

k 

f 4 2 


i 

.4 

.16 


J 4 


2.48 

1 

ts 



_i 

.5 

■25_ 


_S5_ 


3.73_ 


Thus, from (12), 



a 0 


'—0.40' 

v* = 

a* 

= (M T M)~ 1 M T y « 

0.35 


_a* 2 _ 


. 161 . 


so the least squares quadratic fit is 

j = —0.40 + 0.35f + 16.1 1 2 


From this equation we estimate that ig = 16.1 and hence that g — 32.2 ft/sec 2 . Note 
that this equation also provides the following estimates of the initial displacement and 
velocity of the weight: 

Jo = «o = —0.40 ft 
v 0 = a* = 0.35 ft/sec 

In Figure 6.5.5 we have plotted the data points and the approximating polynomial. 
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Temperature of Venusian 
\ Atmosphere 

Magellan orbit 3213 
Date: 5 October 1991 
Latitude: 67 N 
: LTST: 22:05 


30 40 50 60 70 80 90 100 
Altitude h (km) 
Source: NASA 


On October 5, 1991 the Magellan spacecraft 
entered the atmosphere of Venus and transmitted the tempera- 
ture T in kelvins (K) versus the altitude h in kilometers (km) 
until its signal was lost at an altitude of about 34 km. Discount- 
ing the initial erratic signal, the data strongly suggested a linear 
relationship, so a least squares straight line fit was used on the 
linear part of the data to obtain the equation 

T = 737.5 - 8.125 h 

By setting h = 0 in this equation, the surface temperature of 
Venus was estimated at T ss 737.5 K. The accuracy of this result 
has been confirmed by more recent flybys of Venus. 
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Exercise Set 6.5 

In Exercises 1-2, find the least squares straight line fit 
y = ax + b to the data points, and show that the result is rea- 
sonable by graphing the fitted line and plotting the data in the 
same coordinate system. 

1. (0, 0), (1, 2), (2, 7) 2. (0, 1), (2, 0), (3, 1), (3, 2) 

In Exercises -4, find the least squares quadratic fit 
y = A 0 + ciix + a 2 x 2 to the data points, and show that the result 
is reasonable by graphing the fitted curve and plotting the data in 
the same coordinate system. 

3. (2,0), (3,-10), (5,-48), (6,-76) 

4. (1,-2), (0,-1), (1,0), (2,4) 

5. Find a curve of the form y = a 4- ( b/x ) that best fits the 
data points (1, 7), (3, 3), (6, 1) by making the substitution 
X = l/x. 

6. Find a curve of the form y = a 4- b^/x that best fits the data 
points (3, 1.5), (7, 2.5), (10, 3) by making the substitution 
X = *Jx. Show that the result is reasonable by graphing 
the fitted curve and plotting the data in the same coordinate 
system. 

Working with Proofs 

7. Prove that the matrix M in Equation (3) has linearly inde- 
pendent columns if and only if at least two of the numbers 
X \ , x 2 , . . . , x n are distinct. 

8. Prove that the columns of the n x (m + 1) matrix M in Equa- 
tion (11) are linearly independent if n > m and at least m + 1 
of the numbers x\, x 2 , . . . , x n are distinct. [Hint: A nonzero 
polynomial of degree m has at most m distinct roots.] 

9. Let M be the matrix in Equation (11). Using Exercise 8, show 
that a sufficient condition for the matrix M T M to be invert- 
ible is that n > m and that at least rn + 1 of the numbers 
Xi,X 2 , , x n are distinct. 

True-False Exercises 

TF. In parts (a)-(d) determine whether the statement is true or 
false, and justify your answer. 

(a) Every set of data points has a unique least squares straight 
line fit. 

(b) If the data points (jci , Vi), (x 2 , >’ 2 ), . . . , ( x n , y n ) are not col- 
linear, then (2) is an inconsistent system. 

(c) If the data points (x \ , >’ 1 ), (x 2 , y 2 ), ■ ■ ■ , (x n , y n ) do not lie on 
a vertical line, then the expression 

Lvi - (a + bx i)| 2 + \y 2 - (a + bx 2 f \ -\ h \y n - (a + bx n ) \ 2 

is minimized by taking a and b to be the coefficients in the 
least squares line y = a + bx of best fit to the data. 


(d) If the data points (jci , yO, (x 2 , y 2 ), . . . , ( x n , y„) do not lie on 
a vertical line, then the expression 

lyi - (a + bx i)| + |y 2 - (a + bx 2 ) \ 4 h \y n - ( a + bx n ) \ 

is minimized by taking a and b to be the coefficients in the 
least squares line y = a + bx of best fit to the data. 

Working with Technology 

In Exercises T1-T7, find the normal system for the least 
squares cubic fit y = «o + tt\x + a 2 x 2 + a 2 x 3 to the data points. 
Solve the system and show that the result is reasonable by graph- 
ing the fitted curve and plotting the data in the same coordinate 
system. 

Tl. (-1,-14), (0, -5), (1, -4), (2, 1), (3, 22) 

T2. (0,-10), (1,-1), (2,0), (3,5), (4,26) 

T3. The owner of a rapidly expanding business finds that for 
the first five months of the year the sales (in thousands) are 
$4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures on 
a graph and conjectures that for the rest of the year, the sales curve 
can be approximated by a quadratic polynomial. Find the least 
squares quadratic polynomial fit to the sales curve, and use it to 
project the sales for the twelfth month of the year. 

T4. Pathfinder is an experimental, lightweight, remotely piloted, 
solar-powered aircraft that was used in a series of experiments by 
NASA to determine the feasibility of applying solar power for 
long-duration, high-altitude flights. In August 1997 Pathfinder 
recorded the data in the accompanying table relating altitude H 
and temperature T . Show that a linear model is reasonable by plot- 
ting the data, and then find the least squares line H = Ho + kT 
of best fit. 


Table Ex-T4 


Altitude H 
(thousands of feet) 

15 

20 

25 

30 

35 

40 

45 

Temperature T 
(°C) 

4.5 

-5.9 

-16.1 

-27.6 

-39.8 

-50.2 

-62.9 


Three important models in applications are 

exponential models (y — ae bx ) 
power function models (y = ax h ) 
logarithmic models (y = a + b \nx) 

where a and b are to be determined to fit experimental data as 
closely as possible. Exercises '5-T7 are concerned with a proce- 
dure, called linearization , by which the data are transformed to a 
form in which a least squares straight line fit can be used to approx- 
imate the constants. Calculus is required for these exercises. 

T5. (a) Show that making the substitution Y = In y in the equa- 
tion y = ae bx produces the equation Y = bx + Inn whose 
graph in the .rE-plane is a line of slope b and Y -intercept In a. 
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(b) Part (a) suggests that a curve of the form v = ae bx can be fit- 
ted to n data points (x ; , y*) by letting Y ; = In y t , then fitting 
a straight line to the transformed data points (x,-, Y,) by least 
squares to find b and In a, and then computing a from In a. 
Use this method to fit an exponential model to the following 
data, and graph the curve and data in the same coordinate 
system. 


X 

0 

1 

2 

3 

4 

5 

6 

7 

y 

3.9 

5.3 

7.2 

9.6 

12 

17 

23 

31 


T6. (a) Show that making the substitutions 

X = In x and Y = In y 

in the equation y = ax b produces the equation Y = bX + In a 
whose graph in the XY-plane is a line of slope b and Y- 
intercept In a. 

(b) Part (a) suggest that a curve of the form y = ax b can be fitted 
to n data points ( xt , y*) by letting X, = In x, and Y, = lny ; , 
then fitting a straight line to the transformed data points 
(Xj, Yj) by least squares to find b and In a, and then com- 


puting a from In a. Use this method to fit a power function 
model to the following data, and graph the curve and data in 
the same coordinate system. 


X 

2 

3 

4 

5 

6 

7 

S 

9 

y 

1.75 

1.91 

2.03 

2.13 

2.22 

2.30 

2.37 

2.43 


T7. (a) Show that making the substitution X = lnx in the equa- 
tion y = a + b lnx produces the equation y = a + bX whose 
graph in the Xv-plane is a line of slope b and y-intercept a. 

(b) Part (a) suggests that a curve of the form y = a + blnx can 
be fitted to n data points (x,- , y,) by letting X, = In x, and then 
fitting a straight line to the transformed data points (X, , y,) 
by least squares to find b and a. Use this method to fit a loga- 
rithmic model to the following data, and graph the curve and 
data in the same coordinate system. 


X 

2 

3 

4 

5 

6 

7 

S 

9 

y 

4.07 

5.30 

6.21 

6.79 

7.32 

7.91 

S.23 

8.51 


6.6 Function Approximation; Fourier Series 

In this section we will show how orthogonal projections can be used to approximate certain 
types of functions by simpler functions. The ideas explained here have important 
applications in engineering and science. Calculus is required. 


Best Approximations All of the problems that we will study in this section will be special cases of the following 
general problem. 


Approximation Problem Given a function / that is continuous on an interval [« , b] , 
find the “best possible approximation” to / using only functions from a specified 
subspace W of C[a, b]. 


Here are some examples of such problems: 

(a) Find the best possible approximation to e x over [0, 1] by a polynomial of the form 
flo + a\x + nix 2 . 

(b) Find the best possible approximation to sirntx over [—1, 1] by a function of the 
form a 0 + a\e x + 02 e 2x + aj,e 3x . 

(c) Find the best possible approximation to x over [0, 2jt] by a function of the form 
flo + sinx + Ah sin 2x + b\ cosx + b 2 cos 2x. 

In the first example W is the subspace of C[0, 1] spanned by 1, x, and x 2 ; in the second 
example W is the subspace of C[— 1, 1] spanned by 1, e x , e 2x , and e 2x ; and in the third 
example W is the subspace of C[0, lit] spanned by 1, sinx, sin2x, cosx, and cos2x. 
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Measurements of Error 



A Figure 6.6.1 The deviation 
between / and g at x 0 . 


To solve approximation problems of the preceding types, we first need to make the phrase 
“best approximation over [a, £>]’’ mathematically precise. To do this we will need some 
way of quantifying the error that results when one continuous function is approximated 
by another over an interval [a,b]. If we were to approximate /(x) by g(x), and if we 
were concerned only with the error in that approximation at a single point xo, then it 
would be natural to define the error to be 

error = |/(x 0 ) - g(x 0 )| 

sometimes called the deviation between / and g at xo (Figure 6.6.1). Flowever, we are not 
concerned simply with measuring the error at a single point but rather with measuring 
it over the entire interval [a, b]. The difficulty is that an approximation may have small 
deviations in one part of the interval and large deviations in another. One possible way 
of accounting for this is to integrate the deviation |/(x) — g(x)| over the interval [a, b ] 
and define the error over the interval to be 



▲ Figure 6.6.2 The area 
between the graphs of f and g 
over [a, b] measures the error in 
approximating / by g over [a,b]. 


error = f \f(x) - g(x)\ dx (1) 

J a 

Geometrically, (1 ) is the area between the graphs of /(x) and g (x) over the interval [a, b] 
(Figure 6.6.2) — the greater the area, the greater the overall error. 

Although (1) is natural and appealing geometrically, most mathematicians and sci- 
entists generally favor the following alternative measure of error, called the mean square 
error: 

f b 

mean square error = / [/(x) — g(x)]~ dx 


Mean square error emphasizes the effect of larger errors because of the squaring and 
has the added advantage that it allows us to bring to bear the theory of inner product 
spaces. To see how, suppose that f is a continuous function on [a, b ] that we want to 
approximate by a function g from a subspace W of C[a, b], and suppose that C[a, b] is 
given the inner product 

(f, g}= f f(x)g(x) dx 
J a 

It follows that 


l|f-gll = (f- g,f- g) 


-f 


[/(x) — g{x)Y dx = mean square error 


so minimizing the mean square error is the same as minimizing ||f — g|| 2 . Thus, the 
approximation problem posed informally at the beginning of this section can be restated 
more precisely as follows. 


Least Squares 
Approximation 


Least Squares Approximation Problem Let f be a function that is continuous on an 
interval [a, b], let C[a, b] have the inner product 


<f.g)= [ f(x)g(x) dx 


and let W be a finite-dimensional subspace of C[a, b]. Find a function g in W that 
minimizes 


Ilf — gll 


-f 


[f(x ) - g(x)] 2 dx 
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Fourier Series 


Since ||f — g || 2 and ||f — g|| are minimized by the same function g, this problem is equiva- 
lent to looking for a function g in W that is closest to f. But we know from Theorem 6.4. 1 
that g = projw f is such a function (Figure 6.6.3). Thus, we have the following result. 


► Figure 6.6.3 



f = function in C[a, b] 
to be approximated 


g = proj w i = least squares 
approximation 

subspace of to f from W 

approximating 

functions 


i iff is a continuous function on [ a , b], and W is a finite-dimensional 
subspace of C[a, b], then the function g in W that minimizes the mean square error 

[fix) - g(x)] 2 dx 

is g = projiy f, where the orthogonal projection is relative to the inner product 

(f, g> = [ f(x)g(x)dx 

J a 

The function g = proj w f is called the least squares approximation to f from W. 



A function of the form 

T (x) = Co + ci cos x + C2 cos 2x + ■ ■ ■ + c„ cos nx 

+ d\ sin x + c/2 sin 2x + ■ ■ ■ + d„ sin nx 

is called a trigonometric polynomial ; if c n and d„ are not both zero, then T(x) is said to 
have order n. For example, 

T(x) — 2 + cos x — 3 cos 2x + 7 sin 4x 

is a trigonometric polynomial of order 4 with 

Co = 2, ci = 1, C2 — —3, C3 = 0, C4 — 0, d\ =0, dz = 0, di = 0, z/4 = 7 

It is evident from (2) that the trigonometric polynomials of order n or less are the 
various possible linear combinations of 

1, cosx, cos 2x, , cos nx, sinx, sin2x, ..., sinnx (3) 

It can be shown that these 2 n + 1 functions are linearly independent and thus form a 
basis for a (2 n + l)-dimensional subspace of C[fl, b]. 

Let us now consider the problem of finding the least squares approximation of a 
continuous function f(x) over the interval [0 , 27t] by a trigonometric polynomial of 
order n or less. As noted above, the least squares approximation to f from W is the 
orthogonal projection of f on W. To find this orthogonal projection, we must find an 
orthonormal basis g 0 , g 1; . . . , g 2 „ for W, after which we can compute the orthogonal 
projection on W from the formula 


P r °j w f = (f, go>go + ( f ' Si )Si H b (f, g 2 „>g2, 


(4) 
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[see Theorem 6.3.4(A)]. An orthonormal basis for W can be obtained by applying the 
Gram-Schmidt process to the basis vectors in (3) using the inner product 

s> 2 n 

(f. g) = / f(x)g(x) dx 
Jo 


This yields the orthonormal basis 


1 



gi = -^cosx, ..., g„ 
V 77 


1 

—= cos nx, 
V 77 


g/;+l 


1 

—= sinx, . . . , 
V 77 


g 2 n 


1 

—= sinux 
V 77 


(5) 


(see Exercise 6). If we introduce the notation 


a 0 


-?=& go>- = — 7=(f. gi>> • • • > d n =-)=$, g„> 

\ 2.7 T \ 7T \ 7T 

= -^(f,g n+ i> = g 2 „) 

V 77 V 77 


then on substituting (5) in (4), we obtain 


( 6 ) 


proj^f = 


a 0 

2 


+ [a i cosx + ■■■ + «„ cosux] + [A; sin x + ■ ■ ■ + b„ sinwx] 


(7) 


where 

2 2 f 271 1 1 r 2n 

do = -^=< f ’ go> = ~f= / f(x)—=dx = -\ fix) dx 
V2t t V27T Jo V27T 77 J 0 

1 1 r 27r 1 1 A 277 

fli = — (f, g x ) = — == I fix )— — cos x dx = — I fix) cos x dx 

\Jll J 0 ^ JO 


l l r 271 l l r 2n 

d n — — — (f, g„) = — — / fix)— = cos nx dx = — I fix) cos nx dx 

V 77 V 71 " Jo V 71- 77 Jo 

i i r 27r l l a 277 

Ai = — (f, g, l+1 ) = — / fix)—— smxdx = - / /(x) smx dx 

V 77 V 77 7o V 77 77 7o 


1 1 f 277 1 1 A 277 

A„ = — - =(f, g 2 „) = —j= I fix) —= sin nx dx — — I fix) sin nx dx 

\/7l y/7T Jo H J 0 

In short, 


1 f 277 1 f 2n 

dk = — / fix) coskx dx, bk = — I fix) sin kx dx (8) 

77 Jo 77 Jo 


The numbers do, a\, a n ,b\, ... ,b„ are called the Fourier coefficients of f. 


► EXAMPLE 1 Least Squares Approximations 

Find the least squares approximation of fix) = x on [0, 2 jt] by 

(a) a trigonometric polynomial of order 2 or less; 

(b) a trigonometric polynomial of order n or less. 
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Solution (a) 


1 [ 2n 

i r 2n 



fl 0 

= ~ fix) dx = 

— x dx = 2 jt 

(9a) 



TT JO 

n Jo 


For k — 1,2, 

. . . , integration by parts yields (verify) 



1 

r 

i r 2n 



Gk — ~ 

/ f(x) cos kxdx = 

— 1 x cos kx dx — 0 

(9b) 


Tt J 

'o 

X Jo 



1 

n2n 

1 [ 2k 2 



b k — - 

/ fix) sin kx dx = 

— 1 x sin kx dx — — 

(9c) 


Tt J 


it J 0 k 



Thus, the least squares approximation to x on [0, 2n] by a trigonometric polynomial of 
order 2 or less is 


— + a\ cos x + c /2 cos 2x + b\ sin x + bj sin 2x 


or, from (9a), (9b), and (9c), 


x ss re — 2 sin x — sin 2x 



Jean Baptiste 
Fourier (1768-1830) 


Historical Note Fourier was a 
French mathematician and physi- 
cist who discovered the Fourier 
series and related ideas while 
working on problems of heat 
diffusion. This discovery was 
one of the most influential in 
the history of mathematics; it is 
the cornerstone of many fields 
of mathematical research and a 
basic tool in many branches of en- 
gineering. Fourier, a political ac- 
tivist during the French revolution, 
spent time in jail for his defense 
of many victims during the Ter- 
ror. He later became a favorite of 
Napoleon who made him a baron. 

[Image: Hutton Archive/ 
Getty Images] 


Solution (f>) The least squares approximation to x on [0, 27 t] by a trigonometric poly- 
nomial of order n or less is 


x 


«0 

2 


+ [fli cos x + ••• + «„ cosnx] + [b\ sinx + • • • + b n sinnx] 


or, from (9a), (9b), and (9c), 


7r — 2 ( sin x + 


5iii riA, 


The graphs of y = x and some of these approximations are shown in Figure 6.6.4 


► Figure 6.6.4 



It is natural to expect that the mean square error will diminish as the number of terms 
in the least squares approximation 

n 

CIq ^ — \ 

/(x) b > (fl; t coskx + bk sinfcx) 

2 1=1 


increases. It can be proved that for functions / in C[0, 2;r], the mean square error 
approaches zero as n — > +c°; this is denoted by writing 


f(x) = 


G 0 

1 


+ cos kx + bk sinfcx) 

1=1 


The right side of this equation is called the Fourier series for / over the interval [0, 2jr], 
Such series are of major importance in engineering, science, and mathematics. 
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Exercise Set 6.6 

1 . Find the least squares approximation of fix ) = 1 + x over 
the interval [0, 2jt] by 

(a) a trigonometric polynomial of order 2 or less. 

(b) a trigonometric polynomial of order n or less. 

2 . Find the least squares approximation of fix) = x 2 3 over the 
interval [0, 2rr] by 

(a) a trigonometric polynomial of order 3 or less. 

(b) a trigonometric polynomial of order n or less. 

3 . (a) Find the least squares approximation of x over the interval 

[0, 1] by a function of the form a + be x . 

(b) Find the mean square error of the approximation. 

4. (a) Find the least squares approximation of e x over the inter- 

val [ 0 , 1 ] by a polynomial of the form a o + a\X. 

(b) Find the mean square error of the approximation. 

5 . (a) Find the least squares approximation of sin nx over the 

interval [—1, 1] by a polynomial of the form 
a 0 + a iX + a 2 x 1 . 

(b) Find the mean square error of the approximation. 

6. Use the Gram-Schmidt process to obtain the orthonormal 
basis ( 5 ) from the basis ( 3 ). 

7 . Carry out the integrations indicated in Formulas ( 9 a), ( 9 b), 
and ( 9 c). 


8. Find the Fourier series of fix) = jr — x over the interval 

[0, 2nl 

9 . Find the Fourier series of fix) = 1 , 0 < x < n and fix) = 0 , 
tz <x <2 jt over the interval [0, 2n). 

10. What is the Fourier series of sin( 3 x)? 

True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 

false, and justify your answer. 

(a) If a function f in C[a, b ] is approximated by the function g, 
then the mean square error is the same as the area between the 
graphs of fix) and g(x) over the interval [a, b], 

(b) Given a finite-dimensional subspace W of C[a, b], the func- 
tion g = proj w f minimizes the mean square error. 

(c) {1, cosx, sinx, cos2x, sin2x) is an orthogonal subset of the 
vector space C[ 0 , 2 jt] with respect to the inner product 
(f, g) = / 0 2lr fix)gix)dx. 

(d) {1, cosx, sinx, cos2x, sin2.x) is an orthonormal subset of 
the vector space C[ 0 , 2 ;r] with respect to the inner product 
(f, g) = / 0 2lr f{x)g{x)dx. 

(e) { 1 , cos x , sin x , cos 2x , sin 2x ) is a linearly independent subset 
of C[ 0 , 2 jt], 
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1 . Let R 4 have the Euclidean inner product. 

(a) Find a vector in R 4 that is orthogonal to U| = ( 1 , 0 , 0 . 0 ) 
and u 4 = (0, 0, 0. 1) and makes equal angles with 

u 2 = (0, 1. 0, 0) and U 3 = (0, 0, 1, 0). 

(b) Find a vector x = (xy, x 2 , X3, xf) of length 1 that is or- 
thogonal to ui and 14 above and such that the cosine of 
the angle between x and u 2 is twice the cosine of the angle 
between x and 113. 

2. Prove: If (u, v) is the Euclidean inner product on R", and if A 
is an n x n matrix, then 

(u, Av) = (A r u, v) 

[Hint: Use the fact that (u, v) = u • v = v r u.] 

3 . Let M 2 2 have the inner product (U, V) = tr {U T V) = tr(V r ( 7 ) 
that was defined in Example 6 of Section 6 . 1 . Describe the 
orthogonal complement of 

(a) the subspace of all diagonal matrices. 

(b) the subspace of symmetric matrices. 


4. Let Ax = 0 be a system of m equations in n unknowns. Show 
that 

~xf 

x 2 


is a solution of this system if and only if the vector 
x = (xj , x 2 , . . . , x„) is orthogonal to every row vector of A 
with respect to the Euclidean inner product on R" . 


5. Use the Cauchy-Schwarz inequality to show that if 
ai, a 2 , . . . , a„ are positive real numbers, then 

(a 1 + a 2 + • • • + ci„) [ 1 1- • • • -I J > tr 

\ tt 1 tx 2 ci n J 

6. Show that if x and y are vectors in an inner product space and 
c is any scalar, then 

||cx + y|| 2 = c 2 ||x|| 2 + 2 c(x, y) + ||y|| 2 
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7. Let R 3 have the Euclidean inner product. Find two vectors 
of length 1 that are orthogonal to all three of the vectors 
Ul = (1, 1. -1) >U2 = (-2, -1.2), andu 3 = (-1,0, 1). 


14. Prove: If (u, v)i and (u, v) 2 are two inner products on a vector 
space V, then the quantity (u, v) = (u. v), + (u, v) 2 is also an 
inner product. 


8. Find a weighted Euclidean inner product on R" such that the 
vectors 

V! = (1,0,0, ...,0) 

v 2 = (0, V2, 0, . . . , 0) 
v 3 = (0, 0, V3, . . . , 0) 


15. Prove Theorem 6.2.5. 

16. Prove: If A has linearly independent column vectors, and if b 
is orthogonal to the column space of A, then the least squares 
solution of Ax = b is x = 0. 


v„ = (0, 0, 0, ... , sfn) 
form an orthonormal set. 

9. Is there a weighted Euclidean inner product on R 2 for which 
the vectors (1, 2) and (3, —1) form an orthonormal set? Jus- 
tify your answer. 

10. If u and v are vectors in an inner product space V, then u, v, 
and u — v can be regarded as sides of a “triangle” in V (see 
the accompanying figure). Prove that the law of cosines holds 
for any such triangle; that is, 

llu — v|| 2 = ||u|| 2 + || v|| 2 — 2||u|| ||v|| cos6> 
where 9 is the angle between u and v. 


◄ Figure Ex-10 

11. (a) As shown in Figure 3.2.6, the vectors ( k , 0, 0), (0, k, 0), 

and (0, 0, k) form the edges of a cube in R 3 with diagonal 
(k, k, k). Similarly, the vectors 

(k, 0,0,..., 0), (0, Jt, 0, ...,0) (0,0,0 k) 

can be regarded as edges of a “cube” in R" with diagonal 
( k , k, k, . . . , k). Show that each of the above edges makes 
an angle of 9 with the diagonal, where cos 9 = 1 /-Jn. 

(b) ( Calculus required) What happens to the angle 6 in part (a) 
as the dimension of R n approaches oo? 

12. Let u and v be vectors in an inner product space. 

(a) Prove that ||u|| = ||v|| if and only if u + v and u — v are 
orthogonal. 

(b) Give a geometric interpretation of this result in R 2 with 
the Euclidean inner product. 



17. Is there any value of s for which Xi = 1 and x 2 = 2 is the least 
squares solution of the following linear system? 

X\ — X 2 = 1 
2x i + 3jc 2 = 1 
4.x'i + 5*2 = s 


Explain your reasoning. 

18. Show that if p and q are distinct positive integers, then the 
functions f(x) = sin px and g(x) = singx are orthogonal 
with respect to the inner product 

p2 7T 

(f, g)= / f(x)g(x) dx 
Jo 

19. Show that if p and q are positive integers, then the functions 
f(x) = cos px and g(x) = sin qx are orthogonal with respect 
to the inner product 


(f. 8) = 


= f 


f(x)g{x) dx 


20. Let W be the intersection of the planes 

x + y + z = 0 and x — y + z = 0 
in R 3 . Find an equation for W L . 

21. Prove that if a d — be ^ 0, then the matrix 


A = 


a b 
c d 


has a unique QR -decomposition A = QR, where 
1 


Q = 


a 2 + i 


a —c 
c a 


13. Let u be a vector in an inner product space V, and let 
{vi, v 2 , . . . , v„) be an orthonormal basis for V . Show that 
if a, is the angle between u and v,- , then 

cos 2 ai + cos 2 a 2 + ■ ■ ■ + cos 2 a„ = 1 


R = 


V a 2 + c 2 


ab + cd 
ad — be 
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INTRODUCTION In Section 5.2 we found conditions that guaranteed the diagonalizability of an n x n 
matrix, but we did not consider what class or classes of matrices might actually satisfy 
those conditions. In this chapter we will show that every symmetric matrix is 
diagonalizable. This is an extremely important result because many applications utilize 
it in some essential way. 


7.1 Orthogonal Matrices 

In this section we will discuss the class of matrices whose inverses can be obtained by 
transposition. Such matrices occur in a variety of applications and arise as well as 
transition matrices when one orthonormal basis is changed to another. 

Orthogonal Matrices 


Recall from Theorem 1.6.3 
that if either product in (1) 
holds, then so does the other. 
Thus, A is orthogonal if either 
AA t = / or A t A = I. 


We begin with the following definition. 


DEFINITION 1 A square matrix A is said to be orthogonal if its transpose is the same 
as its inverse, that is, if 

A' 1 = A t 


or, equivalently, if 


AA r — A t A = I 


( 1 ) 


► EXAMPLE 1 A 3 x 3 Orthogonal Matrix 


The matrix 


is orthogonal since 



2 

7 

3 

7 

6 

7 


6 

7 

2 

7 

3 

7 


a t a = 

"3 6 2 “ 

111 

2 3 6 

111 


" 3 

7 

6 

7 

2 6 “ 

7 7 

3 2 

7 7 

— 

'1 0 0“ 

0 1 0 


6 2 3 

_ 7 7 7 _ 


2 

_ 7 

6 3 

7 7 _ 


0 0 1 
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► EXAMPLE 2 Rotation and Reflection Matrices Are Orthogonal 

Recall from Table 5 of Section 4.9 that the standard matrix for the counterclockwise 
rotation of R 2 through an angle 6 is 


A = 


cos 9 
sin0 


— sin 0 
cos 9 


This matrix is orthogonal for all choices of 0 since 


COS0 

sinfT 

"cos0 

— sinfT 


'1 

O' 

sin 6 

COS0_ 

_sin0 

COS0_ 


_0 

1 


We leave it for you to verify that the reflection matrices in Tables 1 and 2 and the rotation 
matrices in Table 6 of Section 4.9 are all orthogonal. M 


Observe that for the orthogonal matrices in Examples 1 and 2, both the row vec- 
tors and the column vectors form orthonormal sets with respect to the Euclidean inner 
product. This is a consequence of the following theorem. 


!EM 7.1.1 The following are equivalent for an n x n matrix A. 

(a) A is orthogonal. 

(b) The row vectors of A form an orthonormal set in R n with the Euclidean inner 
product. 

(c) The column vectors of A form an orthonormal set in R" with the Euclidean inner 
product. 


Proof We will prove the equivalence of {a) and ( b ) and leave the equivalence of (a) and 
(c) as an exercise. 


(a) (h) Letr, be the /throw vector and Cj the yth column vector of A. Since transpos- 

ing a matrix converts its columns to rows and rows to columns, it follows that = r ; -. 
Thus, it follows from the row-column rule [Formula (5) of Section 1.3] and the bottom 
form listed in Table 1 of Section 3.2 that 


r t T 

r l c l r l c 2 • 

T T 

r 2 cj r 2 c^ • 

T-| 

' 

T 

■ Tzc „ 


ri - ri rj • r 2 • 

r 2 • ri r 2 • r 2 ■ 

• r t • r n 

r 2 • r„ 

T T 

\j n c\ r„C2 • 

T 

■ J 


_r n • F r„ • r 2 • 

J* n * _ 


It is evident from this formula that AA T = I if and only if 

r ! • ri = r 2 • r 2 = ■ • • = r„ • r„ = 1 


WARNING Note that an or- 
thogonal matrix has orthonor- 
mal rows and columns — not 
simply orthogonal rows and 
columns. 


and 

n • tj = 0 when i / j 

which are true if and only if [ri, r 2 , . . . , r„} is an orthonormal set in R". 

The following theorem lists four more fundamental properties of orthogonal matri- 
ces. The proofs are all straightforward and are left as exercises. 
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Orthogonal Matrices as 
Linear Operators 


THEOREM 7.1.2 

(a) The transpose of an orthogonal matrix is orthogonal. 

( b ) The inverse of an orthogonal matrix is orthogonal. 

(c) A product of orthogonal matrices is orthogonal. 

(i d ) If A is orthogonal, then det(A) = 1 ordet(A) = — 1. 

► EXAMPLE 3 det(A) = ±1 for an Orthogonal Matrix A 

The matrix 

- j_ j_- 

. _ d2 

A ~ l J_ 

L Vi aJ 

is orthogonal since its row (and column) vectors form orthonormal sets in R 2 with 
the Euclidean inner product. We leave it for you to verify that det(A) = 1 and that 
interchanging the rows produces an orthogonal matrix whose determinant is — 1 . M 


We observed in Example 2 that the standard matrices for the basic reflection and rotation 
operators on R 2 and R 3 are orthogonal. The next theorem will explain why this is so. 


THEOREM 7.1.3 If A is an n x n matrix, then the following are equivalent. 

(a) A is orthogonal. 

(b) || Ax|| = ||x|| for all x in R n . 

(c) Ax • Ay = x • y for all x and y in R". 

Proof We will prove the sequence of implications (a) => ( b ) =>■ (c) => (a). 

(a) =>■ ( Jb) Assume that A is orthogonal, so that A r A = I . It follows from Formula (26) 
of Section 3.2 that 

|| Ax|| = (Ax • Ax) 1/2 = (x • A r Ax) 1/2 = (x • x) 1/2 = ||x|| 

(b) =>■ (c) Assume that || Ax|| = ||x|| for all x in R n . From Theorem 3.2.7 we have 

Ax • Ay = ||| Ax + Ay || 2 - 1 1| Ax - Ay|| 2 = ||| A(x + y)|| 2 - ||| A(x - y)|| 2 
= ll|x + y|| 2 -!l|x-y|| 2 = x.y 

(c) => (a) Assume that Ax • Ay = x • y for all x and y in R" . It follows from Formula (26) 
of Section 3.2 that 

x • y = x • A T Ay 

which can be rewritten as x • (A r Ay — y) = 0 or as 

x • (A t A -I) y = 0 

Since this equation holds for all x in R' 1 , it holds in particular if x = (A r A — 7)y, so 

(A r A — 7)y • (A t A — 7)y = 0 

Thus, it follows from the positivity axiom for inner products that 

(A T A - 7)y = 0 
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0 


l|7»|| = INI, Ji(v)ll = IMI 

a = p, d(T A (u), T a (\)) = dtp , v) 


▲ Figure 7.1.1 


Change of Orthonormal 
Basis 


Since this equation is satisfied by every vector y in R n , it must be that A T A — I is the 
zero matrix (why?) and hence that A T A = I . Thus, A is orthogonal. 

Theorem 7.1.3 has a useful geometric interpretation when considered from the view- 
point of matrix transformations: If A is an orthogonal matrix and T A : R n -> R" is mul- 
tiplication by A, then we will call T A an orthogonal operator on R" . It follows from parts 
(a) and (b) of Theorem 7.1.3 that the orthogonal operators on R n are precisely those 
operators that leave the lengths (norms) of vectors unchanged. However, as illustrated 
in Figure 7.1.1, this implies that orthogonal operators also leave angles and distances 
between vectors in R n unchanged since these can be expressed in terms of norms [see 
Definition 2 and Formula (20) of Section 3.2], 

Orthonormal bases for inner product spaces are convenient because, as the following 
theorem shows, many familiar formulas hold for such bases. We leave the proof as an 
exercise. 

THEOREM 7.1.4 If S is an orthonormal basis for an n -dimensional inner product space 
V, and if 


(u)s = (mi, u 2 , . . . , u n ) and (v) s = (v u v 2 , . . . , v n ) 

then : 



( b ) d (u, v) = si (i/i - Vi) 2 + (u 2 - v 2 ) 2 H b [u n - v„) 2 

(c) (u, v) = itiVi 4* U2V2 H 1 -U„v n 


Remark Note that the three parts of Theorem 7.1.4 can be expressed as 


Hull = ||(u)s|| d(u, v) = d((u)s, (v)s) (u,v) =((u)s, (v)s) 


where the norm, distance, and inner product on the left sides are relative to the inner product on 
V and on the right sides are relative to the Euclidean inner product on R " . 

Transitions between orthonormal bases for an inner product space are of special 
importance in geometry and various applications. The following theorem, whose proof 
is deferred to the end of this section, is concerned with transitions of this type. 

Let V be a finite-dimensional inner product space. If P is the tran- 
sition matrix from one orthonormal basis for V to another orthonormal basis for V, 
then P is an orthogonal matrix. 


EXAMPLE 4 Rotation of Axes in 2-Space 

In many problems a rectangular xy-coordinate system is given, and a new x'y' -coordinate 
system is obtained by rotating the xy-system counterclockwise about the origin through 
an angle 9. When this is done, each point Q in the plane has two sets of coordinates — 
coordinates (x, y) relative to the xy-system and coordinates (x', y') relative to the x'y'- 
system (Figure 7.1.2a). 

By introducing unit vectors Ui and u 2 along the positive x- and y-axes and unit vec- 
tors u) and u) along the positive x'- and y'- ax es, we can regard this rotation as a change 



0 


l|7»U = Hull, 2»|| = IMI 
a = p, d{T A ( u), T a (\)) = d( u, v) 


▲ Figure 7.1.1 


Change of Orthonormal 
Basis 
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from an old basis B — {ui, u 2 } to a new basis B' — {u'j, u 2 } (Figure 7.1.26). Thus, the 
new coordinates (x\ y') and the old coordinates (x, y ) of a point Q will be related by 



( 2 ) 


where P is the transition from B' to B. To find P we must determine the coordinate 
matrices of the new basis vectors u', and u' 2 relative to the old basis. As indicated in 
Figure 7.1.2c, the components of u'j in the old basis are cos 9 and sin 9, so 


[u,]/; 


cos 9 
sin# 


Similarly, from Figure 7.1.2c? we see that the components of u 2 in the old basis are 
cos (9 + jr/2) = — sin# and sin(# + tt/2) = cosd, so 

sin#~ 


[U 2 ]ft = 

Thus the transition matrix from B' to B is 

"cos# 


cos# 


P = 


sin# 


- sin# 
cos# 


( 3 ) 


Observe that P is an orthogonal matrix, as expected, since B and B’ are orthonormal 
bases. Thus 


p-\ = pT 


cos 9 sin 6 
— sin 9 cos 9 


so (2) yields 


or, equivalently, 



cost? 

sin 9 " 

X 

— sin 9 

COS0_ 

_y_ 


( 4 ) 


x' = x cos 9 + y sin 9 
y’ = —x sin 9 + y cos 9 


(5) 


These are sometimes called the rotation equations for R 2 . 



A Figure 7.1.2 


^ EXAMPLE 5 Rotation of Axes in 2-Space 

Use form (4) of the rotation equations for R 2 to find the new coordinates of the point 
0(2, 1) if the coordinate axes of a rectangular coordinate system are rotated through an 
angle of 9 = it/ 4. 


Jt 7T 1 

sin — = cos — = — — 
4 4^/2 


Solution Since 
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the equation in (4) becomes 


V 


1 

V2 

1 

V2 

~ x~ 



1 

V2 

1 

V2_ 

_y_ 


Thus, if the old coordinates of a point Q are (x, y) = (2, —1), then 


~x r 


1 

V2 

1 

V2 

" 2 " 


1 

V2 

y. 


1 

V2 

1 

V2_ 

_-i_ 


1 

1 

1 


so the new coordinates of Q are 





Remark Observe that the coefficient matrix in (4) is the same as the standard matrix for the 
linear operator that rotates the vectors of R 2 through the angle —9 (see margin note for Table 5 
of Section 4.9). This is to be expected since rotating the coordinate axes through the angle 9 with 
the vectors of R 2 kept fixed has the same effect as rotating the vectors in R 2 through the angle —8 
with the axes kept fixed. 



► EXAMPLE 6 Application to Rotation of Axes in 3-Space 


Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counter- 
clockwise (looking down the positive z-axis) through an angle 6 (Figure 7.1.3). If we 
introduce unit vectors ui, U2, and U3 along the positive x-, y-, and z-axes and unit vec- 
tors u'j, Uj, and u' 3 along the positive x'-, y'-, and z'-axes, we can regard the rotation as 
a change from the old basis B — {m, U2, u 3 } to the new basis B’ = {u'j, u' 2 , u' 3 }. In light 
of Example 4, it should be evident that 


[l|]« = 

cos 9 

sind 

and [UjIb = 

— sind 

cos 9 


1 

O 

1 


1 

O 

1 


Moreover, since u 3 extends 1 unit up the positive z'-axis, 


[u 3 ]b 


0 

0 

1 


It follows that the transition matrix from B ' to B is 

cos 0 — sin 9 

P = 


sind 

0 


cos 0 
0 


0 

0 

1 


and the transition matrix from B to B' is 

cos 6 sin 9 
P~ L = — sind cost) 


0 


0 


0 

0 

1 


(verify). Thus, the new coordinates (x', y', z') of a point Q can be computed from its old 
coordinates (x, y, z) by 




cos 9 sin 9 0 


X 

y' 

= 

— sin 6 cos 9 0 


y 

z' 


0 0 1 


z 


OPTIONAL 


We conclude this section with an optional proof of Theorem 7.1.5. 
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Recall that ( 11)5 denotes a 
coordinate vector expressed 
in comma-delimited form 
whereas [ 11 ] s denotes a coord- 
inate vector expressed in 
column form. 


Proof of Theorem 7.1.5 Assume that V is an n -dimensional inner product space and 
that P is the transition matrix from an orthonormal basis B' to an orthonormal basis 
B. We will denote the norm relative to the inner product on V by the symbol || || y to 
distinguish it from the norm relative to the Euclidean inner product on R " , which we 
will denote by || ||. 

To prove that P is orthogonal, we will use Theorem 7.1.3 and show that ||Px|| = ||x|| 
for every vector x in R" . As a first step in this direction, recall from Theorem 7.1. 4(a) 
that for any orthonormal basis for V the norm of any vector u in V is the same as the 
norm of its coordinate vector with respect to the Euclidean inner product, that is, 

l|u||v = ||[u]fl'|| = ||[u]fl|| 


or 

l|u|| v = l|[u] B '|| = ||E[u] b -|| (6) 

Now let x be any vector in R n , and let u be the vector in V whose coordinate vector with 
respect to the basis B' is x, that is, [u]b- = x. Thus, from (6), 

II u || = ||x|| = || Px|| 
which proves that P is orthogonal. 


Exercise Set 7.1 

In each part of Exercises 1-4, determine whether the matrix is 
orthogonal, and if so find it inverse. 


1 . (a) 

"l o' 

_0 - 1 _ 

(b) 

- 1 

V2 

1 

1 “ 

V2 

1 



LV2 

V2 J 


2. (a) 

T o' 

0 1_ 

(b) 

- 1 

x/5 

2 

2 " 

Vs 

1 



- >/5 

Vs - 



'0 

1 

1 - 

V2 


1 

Vi 

l 

V6 

1 “ 

V3 

3. (a) 

1 

0 

0 

(b) 

0 

2 

V6 

1 

V3 


0 

0 

1 

Vi _ 


l 

- Vi 

J_ 

•s/6 

1 

x/3 _ 


"1111“ 


1 

O 

o 

o 

1 

2 2 2 2 


15 11 

2 6 6 6 

(b) 

0 

-id- 

1 

o 

1115 

2 6 6 6 

o 

-K 

o 

115 1 

_ 2 6 6 6 _ 


1 

o 

id- 

K>l — 

o 

1 


In Exercises 5-6, show that the matrix is orthogonal three ways: 
first by calculating A T A, then by using part ( b ) of Theorem 7.1.1, 
and then by using part (c) of Theorem 7.1.1. 


r 4 

o 

3-i 


- l 

2 

2-i 

5 


5 


3 

3 

3 

9 

4 

12 

6. A = 

2 

2 

1 

25 

5 

25 


3 

3 

3 

12 

3 

16 


2 

1 

2 

L 25 

5 

25 -J 


L 3 

3 

3-J 


7. Let T a : R 3 —> R 3 be multiplication by the orthogonal matrix 
in Exercise 5. Find Tt(x) for the vector x = (—2, 3, 5), and 
confirm that ||7a(x)|| = ||x|| relative to the Euclidean inner 
product on R 3 . 

8 . Let T a : R 3 — > R 3 be multiplication by the orthogonal matrix in 
Exercise 6 . Find (x) for the vector x = (0, 1 , 4), and con- 
firm || TA (x) || = ||x|| relative to the Euclidean inner product 
on R 3 . 

9. Are the standard matrices for the reflections in Tables 1 and 2 
of Section 4.9 orthogonal? 

10. Are the standard matrices for the orthogonal projections in 
Tables 3 and 4 of Section 4.9 orthogonal? 

11. What conditions must a and b satisfy for the matrix 

a + b b — a 
a — b b + a 

to be orthogonal? 

12. Under what conditions will a diagonal matrix be orthogonal? 

13. Let a rectangular x'y'-coordinate system be obtained by ro- 
tating a rectangular xy-coordinate system counterclockwise 
through the angle 6 = it/ 3. 

(a) Find the x'y'-coordinates of the point whose 
xy-coordinates are (— 2 , 6 ). 

(b) Find the xy-coordinates of the point whose 
xy-coordinates are (5, 2). 

14. Repeat Exercise 13 with 6 = 3tr/4. 
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15. Let a rectangular x'y'z'-coordinate system be obtained by ro- 
tating a rectangular xyz-coordinate system counterclockwise 
about the z-axis (looking down the z-axis) through the angle 
9 = rr/4. 

(a) Find the x'y'z'-coordinates of the point whose 
xyz-coordinates are (—1,2, 5). 

(b) Find the xyz-coordinates of the point whose 
x'y'z'-coordinates are (1, 6, —3). 

16. Repeat Exercise 15 for a rotation of 9 = 3tt/ 4 counterclock- 
wise about the x-axis (looking along the positive x-axis toward 
the origin). 


23. The set 5 = { y |x 2 — y | J is an orthonormal ba- 

sis for P 2 with respect to the evaluation inner product at the 
points xo = — 1, xi = 0, x 2 — 1. Let p = p(x ) = 1 + x + x 2 
and q = q(x) = 2x — x 2 . 

(a) Find (p) s and (q) 5 . 

(b) Use Theorem 7.1.4 to compute ||p||, d( p, q) and (p, q). 

24. The sets S = {1, x} and S' = j J=(l + x), (1 — x) J are or- 

thonormal bases for P t with respect to the standard inner 
product. Find the transition matrix P from S to S', and ver- 
ify that the conclusion of Theorem 7.1.5 holds for P. 


17. Repeat Exercise 1 5 for a rotation of 9 = rr/3 counterclockwise 
about the y-axis (looking along the positive y-axis toward the 
origin). 


18. A rectangular x'y'z'-coordinate system is obtained by rotating 
an xyz-coordinate system counterclockwise about the y-axis 
through an angle 9 (looking along the positive y-axis toward 
the origin). Find a matrix A such that 


/ 


= A 


y 


z 


z 


where (x, y, z) and (x', y', z') are the coordinates of the same 
point in the xyz- and x'y'z'-systems, respectively. 


19. Repeat Exercise 18 for a rotation about the x-axis. 

20. A rectangular x"y"z"-coordinate system is obtained by first 
rotating a rectangular xyz-coordinate system 60° counter- 
clockwise about the z-axis (looking down the positive z-axis) 
to obtain an x'y'z'-coordinate system, and then rotating the 
x'y'z'-coordinate system 45° counterclockwise about the y'- 
axis (looking along the positive y'-axis toward the origin). 
Find a matrix A such that 



where (x, y, z) and (x", y", z") are the xyz- and x"y"z"- 
coordinates of the same point. 

21. A linear operator on R 2 is called rigid if it does not change the 
lengths of vectors, and it is called angle preserving if it does 
not change the angle between nonzero vectors. 

(a) Identify two different types of linear operators that are 
rigid. 

(b) Identify two different types of linear operators that are 
angle preserving. 

(c) Are there any linear operators on R 2 that are rigid and not 
angle preserving? Angle preserving and not rigid? Justify 
your answer. 


22. Can an orthogonal operator Ta- R" -*■ R" map nonzero vec- 
tors that are not orthogonal into orthogonal vectors? Justify 
your answer. 


Working with Proofs 

25. Prove that if x is an n x 1 matrix, then the matrix 

2 T 

A = I n —XX 

x' X 

is both orthogonal and symmetric. 

26. Prove that a 2 x 2 orthogonal matrix A has only one of two 
possible forms: 

cos 9 sin 9 
sin 9 — cos 9 

where 0 <9 <2n. [Hint: Start with a general 2 x 2 matrix A, 
and use the fact that the column vectors form an orthonormal 
set in R 2 .] 


A = 


cos 9 — sin 9 

sin 9 cos 9 


or A = 


27. (a) Use the result in Exercise 26 to prove that multiplication 

by a 2 x 2 orthogonal matrix is a rotation if det(A) = 1 
and a reflection followed by a rotation if det( A) = — 1 . 

(b) In the case where the transformation in part (a) is a reflec- 
tion followed by a rotation, show that the same transfor- 
mation can be accomplished by a single reflection about 
an appropriate line through the origin. What is that line? 
[Hint: See Formula (6) of Section 4.9.] 

28. In each part, use the result in Exercise 27(a) to determine 
whether multiplication by A is a rotation or a reflection fol- 
lowed by rotation. Find the angle of rotation in both cases, 
and in the case where it is a reflection followed by a rotation 
find an equation for the line through the origin referenced in 
Exercise 27(b). 


r 1 

1 -| 


1 V3~ 

V2 

V2 

(b) A = 

2 2 

1 

1 


V3 1 

L V2 

V2J 


2 2 _ 


29. The result in Exercise 27(a) has an analog for 3 x 3 orthogo- 
nal matrices. It can be proved that multiplication by a 3 x 3 
orthogonal matrix A is a rotation about some line through the 
origin of R } if det(A) = 1 and is a reflection about some co- 
ordinate plane followed by a rotation about some line through 
the origin if det( A) = —1. Use the first of these facts and The- 
orem 7.1.2 to prove that any composition of rotations about 
lines through the origin in R 1 can be accomplished by a single 
rotation about an appropriate line through the origin. 
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30. Euler’s Axis of Rotation Theorem states that: If A is an orthog- 
onal 3x3 matrix for which det(A) = 1, then multiplication by 
A is a rotation about a line through the origin in R 3 . Moreover , 
if u is a unit vector along this line, then Au = u. 

(a) Confirm that the following matrix A is orthogonal, that 
det(A) = 1, and that there is a unit vector u for which 

Au = u. 

"2 3 6" 

7 7 7 

A “ 7 7 7 

6 2 3 

_7 7 7 _ 

(b) Use Formula (3) of Section 4.9 to prove that if A is a 3 x 3 
orthogonal matrix for which det(A) = 1, then the angle 
of rotation resulting from multiplication by A satisfies the 
equation cosd = '[tr(A) — 1], Use this result to find the 
angle of rotation for the rotation matrix in part (a). 

31. Prove the equivalence of statements (a) and (c) that are given 
in Theorem 7.1.1. 


True-False Exercises 

TF. In parts (a)-(h) determine whether the statement is true or 
false, and justify your answer. 


(a) The matrix 


(b) The matrix 


1 

0 

0 

'I 

2 


0 

1 

0 

- 2 ' 

1 


is orthogonal. 


is orthogonal. 


(c) An m x n matrix A is orthogonal if A T A = /. 


(d) A square matrix whose columns form an orthogonal set is 
orthogonal. 


(e) Every orthogonal matrix is invertible. 

(f) If A is an orthogonal matrix, then A 1 is orthogonal and 
(detA) 2 = 1. 


(g) Every eigenvalue of an orthogonal matrix has absolute value 1 . 

(h) If A is a square matrix and ||Au|| = 1 for all unit vectors u, 
then A is orthogonal. 

Working with Technology 

Tl. If a is a nonzero vector in R n , then aa r is called the outer 
product of a with itself, the subspace a x is called the hyperplane in 
R 11 orthogonal to a, and the n x n orthogonal matrix 

2 _ 

H. d i = / - -^aa r 
a J a 

is called the Householder matrix or the Householder reflection 
about named in honor of the American mathematician Al- 
ston S. Householder ( 1 904-1993). In R 2 the matrix H d ± represents 
a reflection about the line through the origin that is orthogonal to 
a, and in R 2 it represents a reflection about the plane through the 
origin that is orthogonal to a. In higher dimensions we can view 
H a ± as a “reflection” about the hyperplane a x . Householder reflec- 
tions are important in large-scale implementations of numerical 
algorithms, particularly (^-decompositions, because they can be 
used to transform a given vector into a vector with specified zero 
components while leaving the other components unchanged. This 
is a consequence of the following theorem [see Contemporary Lin- 
ear Algebra , by Howard Anton and Robert C. Busby (Hoboken, 
NJ: John Wiley & Sons, 2003, p. 422)]. 


Theorem. If v and w are distinct vectors in R n with the same 
norm , then the Householder reflection about the hyperplane 
(v — w) x maps v into w and conversely. 


(a) Find a Householder reflection that maps the vector 
v = (4, 2, 4) into a vector w that has zeros as its second 
and third components. Find w. 

(b) Find a Householder reflection that maps the vector 
v = (3, 4, 2, 4) into the vector whose last two entries are 
zero, while leaving the first entry unchanged. Find w. 


7.2 Orthogonal Diagonalization 

In this section we will be concerned with the problem of diagonalizing a symmetric matrix 
A. As we will see, this problem is closely related to that of finding an orthonormal basis for 
R" that consists of eigenvectors of A. Problems of this type are important because many of 
the matrices that arise in applications are symmetric. 


The Orthogonal In Section 5.2 we defined two square matrices, A and B, to be similar if there is an 
Diagonalization Problem invertible matrix P such that P~ l AP = B. In this section we will be concerned with 

the special case in which it is possible to find an orthogonal matrix P for which this 
relationship holds. 

We begin with the following definition. 
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Conditions for Orthogonal 
Diagonalizability 


DEFINITION 1 If A and B are square matrices, then we say that B is orthogonally 
similar to A if there is an orthogonal matrix P such that B = P T AP . 


Note that if B is orthogonally similar to A, then it is also true that A is orthogonally 
similar to B since we can express A as A — Q T BQ by taking Q = P T (verify). This 
being the case we will say that A and B are orthogonally similar matrices if either is 
orthogonally similar to the other. 

If A is orthogonally similar to some diagonal matrix, say 

P t AP = D 

then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A . 

Our first goal in this section is to determine what conditions a matrix must satisfy 
to be orthogonally diagonalizable. As an initial step, observe that there is no hope of 
orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, suppose 
that 

P t AP = D (1) 

where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side 
of (1) by P, the right side by P T , and then using the fact that PP T = P T P = 1, we can 
rewrite this equation as 

A = PDP t (2) 

Now transposing both sides of this equation and using the fact that a diagonal matrix is 
the same as its transpose we obtain 

A t = ( PDP t ) t = ( P t ) t D t P t = PDP t = A 

so A must be symmetric if it is orthogonally diagonalizable. 

The following theorem shows that every symmetric matrix with real entries is, in fact, 
orthogonally diagonalizable. In this theorem, and for the remainder of this section, 
orthogonal will mean orthogonal with respect to the Euclidean inner product on R n . 


!EM 7.2.1 If A is an n x n matrix with real entries, then the following are equiv- 
alent. 

(a) A is orthogonally diagonalizable. 

( b ) A has an orthonormal set of n eigenvectors. 

(c) A is symmetric. 


Proof (a) => (b) Since A is orthogonally diagonalizable, there is an orthogonal matrix P 
such that P~ l AP is diagonal. As shown in Formula (2) in the proof of Theorem 5.2.1, 
the n column vectors of P are eigenvectors of A. Since P is orthogonal, these column 
vectors are orthonormal, so A has n orthonormal eigenvectors. 

(b) => (a) Assume that A has an orthonormal set of n eigenvectors {p t , p 2 , . . . , p„}. As 
shown in the proof of Theorem 5.2.1, the matrix P with these eigenvectors as columns 
diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal and thus 
orthogonally diagonalizes A. 

(a) =$ (c) In the proof that (a) =>■ ( b ) we showed that an orthogonally diagonalizable 
n x n matrix A is orthogonally diagonalized by an n x n matrix P whose columns form 
an orthonormal set of eigenvectors of A . Let D be the diagonal matrix 

D = P t AP 
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Properties of Symmetric 
Matrices 


from which it follows that 

A = PDP t 


Thus, 


A r = ( PDP T f = PD t P t = PDP t = A 


which shows that A is symmetric. 


(c) => (a) The proof of this part is beyond the scope of this text. However, because it is 
such an important result we have outlined the structure of its proof in the exercises (see 
Exercise 31). 


Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, 
but before we can do so, we need the following critical theorem about eigenvalues and 
eigenvectors of symmetric matrices. 


!EM 7.2.2 If A is a symmetric matrix with real entries, then: 
(a) The eigenvalues of A are all real numbers. 

{. b ) Eigenvectors from different eigenspaces are orthogonal. 


Part (a), which requires results about complex vector spaces, will be discussed in 
Section 7.5. 

Proof [b) Let vi and \ 2 be eigenvectors corresponding to distinct eigenvalues 7. i and 7 2 
of the matrix A. We want to show that Vi • v 2 = 0. Our proof of this involves the trick 
of starting with the expression Avi • v 2 . It follows from Formula (26) of Section 3.2 and 
the symmetry of A that 

Avi • v 2 = Vi • A T \ 2 = vi • Av 2 (3) 

But vi is an eigenvector of A corresponding to 7 [; and v 2 is an eigenvector of A corre- 
sponding to a 2 , so (3) yields the relationship 

TaV! • v 2 = V! • 7 2 v 2 


which can be rewritten as 

(7| - 7-2 ) (v i • v 2 ) = 0 (4) 

But 7.1 — 7 2 0, since 7i and 7 2 were assumed distinct. Thus, it follows from (4) that 

V! • v 2 = 0. 

Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a sym- 
metric matrix. 


Orthogonally Diagonalizing an n x n Symmetric Matrix 

Step 1. Find a basis for each eigenspace of A. 

Step 2. Apply the Gram-Schmidt process to each of these bases to obtain an or- 
thonormal basis for each eigenspace. 

Step 3. Form the matrix P whose columns are the vectors constructed in Step 2. This 
matrix will orthogonally diagonalize A, and the eigenvalues on the diagonal 
of D = P t AP will be in the same order as their corresponding eigenvectors 
in P. 
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Remark The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvec- 
tors from different eigenspaces are orthogonal, and applying the Gram-Schmidt process ensures 
that the eigenvectors within the same eigenspace are orthonormal. Thus the entire set of eigenvec- 
tors obtained by this procedure will be orthonormal. 


► EXAMPLE 1 Orthogonally Diagonalizing a Symmetric Matrix 

Find an orthogonal matrix P that diagonalizes 

‘4 2 2 

A = 


Solution We leave it for you to verify that the characteristic equation of A is 

"a - 4 -2 -2 " 

-2 A. — 4 -2 

-2 -2 A -4 


det(A7 — A) = det 


= (A - 2) 2 (A - 8) = 0 


Thus, the distinct eigenvalues of A are A = 2 and A = 8. By the method used in Example 7 
of Section 5.1, it can be shown that 


( 5 ) 


form a basis for the eigenspace corresponding to A = 2. Applying the Gram-Schmidt 
process to {ui, U2} yields the following orthonormal eigenvectors (verify): 



‘-f 


‘-l‘ 

Ui = 

1 

and u 2 = 

0 


0 


1 



1 - 

72 


- l_ ~ 

\/6 

Vl = 

1 

72 

and v 2 = 

1_ 

x/6 


0 


2 



- x/6 - 


( 6 ) 


The eigenspace corresponding to A = 8 has 


u 3 


as a basis. Applying the Gram-Schmidt process to {u 3 } (i.e., normalizing u 3 ) yields 


v 3 


73 

J_ 

75 

_l_ 

LvT J 


Finally, using Vi, V2, and v 3 as column vectors, we obtain 

1 

'75 

1 

75 

0 4^ 


P = 


76 

76 

_2 

V6 


J_ 

75 

J_ 

73 

]_ 

73 J 


which orthogonally diagonalizes A. As a check, we leave it for you to confirm that 

0 


P t AP = 


1 

'V2 
1 

x/6 
1 

L V3 


1 

V2 

_ J_ 

y/6 

1 

V3 


2 

x/ 6 
1 

V3J 


'4 2 2' 
2 4 2 
2 2 4 


V2 

1 

V2 


1 

-s/6 

1 

V6 

_2 

y/6 


0 -W 


1 

‘2 

0 

1 — 
0 

= 

0 

2 

0 

1 

_0 

0 

00 


7.2 Orthogonal Diagonalization 413 


Spectral Decomposition 


If A is a symmetric matrix that is orthogonally diagonalized by 

p = [ui u 2 • • • u„] 


and if A-i, A. 2 , ,X n are the eigenvalues of A corresponding to the unit eigenvectors 

Ui, u 2 u„, then we know that D — P T AP, where D is a diagonal matrix with the 

eigenvalues in the diagonal positions. It follows from this that the matrix A can be 
expressed as 



~Xi 0 • 

• o' 


r t~\ 

PDP T = [ Ui u 2 ••• u„] 

0 k 2 • 

• 0 


T 

u 7 


O 

O 

1 

^n_ 


T 

K 


= [A.1U1 


k 2 U2 






Multiplying out, we obtain the formula 


A = kjUjuf + k 2 u 2 u[ H b k„u„u 7 ' 


(7) 


which is called a spectral decomposition of A . 

Note that in each term of the spectral decomposition of A has the form kuu r , where 
u is a unit eigenvector of A in column form, and X is an eigenvalue of A corresponding to 
u. Since u has size n x 1, it follows that the product uu 7 has size n x n. It can be proved 
(though we will not do it) that uu r is the standard matrix for the orthogonal projection 
of R" on the subspace spanned by the vector u. Accepting this to be so, the spectral 
decomposition of A tells that the image of a vector x under multiplication by a symmetric 
matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional 
subspaces) determined by the eigenvectors of A, then scaling those projections by the 
eigenvalues, and then adding the scaled projections. Here is an example. 


► EXAMPLE 2 A Geometric Interpretation of a Spectral Decomposition 

The matrix 



has eigenvalues X\ = —3 and X 2 — 2 with corresponding eigenvectors 



' f 


'2' 

Xi = 

-2 

and x 2 = 

1 


(verify). Normalizing these basis vectors yields 


1 

V5 

A X2 

and u? = = 

H“>' 

1 

A 

~Vs_ 

l|x 2 || 



The terminology spectral decomposition is derived from the fact that the set of all eigenvalues of a matrix 
A is sometimes called the spectrum of A . The terminology eigenvalue decomposition is due to Professor Dan 
Kalman, who introduced it in an award-winning paper entitled “A Singularly Valuable Decomposition: The 
SVD of a Matrix,” The College Mathematics Journal, Vol. 27, No. 1, January 1996. 
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so a spectral decomposition of A is 


"1 2 

2 —2 

= AjUjuf + A 2 u 2 u 2 = (-3) 

1 

M bib 

1 

[}s Vs] + b) 

2 

V5 

1 

[vs Vs] 



-s/5_ 


_V5_ 




I _ 1 


~ 4 2 

(-3) 

1 

1 

<_n 1 4^- Oi 
1 

+ (2) 

5 5 

2 1 

5 5 


where, as noted above, the 2x2 matrices on the right side of (8) are the standard matrices 
for the orthogonal projections onto the eigenspaces corresponding to the eigenvalues 
A.] = — 3 and X 2 = 2, respectively. 

Now let us see what this spectral decomposition tells us about the image of the vector 
x = (1, 1) under multiplication by A. Writing x in column form, it follows that 


and from (8) that 


Ax = 


'1 2 ' 

T 

.2 - 2 . 

_ 1 _ 


Ax = 


= (—3) 


= (—3) 


'1 

2" 

T 


'3' 

2 

— 2 _ 

_1 


_ 0 _ 


(9) 


1 _ 2 

5 5 

2 4 

'5 5 J 


+ ( 2 ) 


6 ' 

5 

3 

L 5 


( 2 ) 


2 

5 

l 

5 J 


“ 3“ 


_ 12“ 



5 

+ 

5 



3 

6 

_ 5 _ 

6 

. 5 _ 


_ 0 _ 


(10) 


Formulas (9) and (10) provide two different ways of viewing the image of the vector (1,1) 
under multiplication by A: Formula (9) tells us directly that the image of this vector is 

(3.0) , whereas Formula (10) tells us that this image can also be obtained by projecting 

(1.1) onto the eigenspaces corresponding to Ai = —3 and A 2 = 2 to obtain the vectors 
(— j, |) and (| , |) , then scaling by the eigenvalues to obtain (|, — |) and (-y, |) , 
and then adding these vectors (see Figure 7.2.1). 



The Nondiagonalizable If A is an n x n matrix that is not orthogonally diagonalizable, it may still be possible 
Case to achieve considerable simplification in the form of P T AP by choosing the orthogonal 
matrix P appropriately. We will consider two theorems (without proof) that illustrate 
this. The first, due to the German mathematician Issai Schur, states that every square 
matrix A is orthogonally similar to an upper triangular matrix that has the eigenvalues 
of A on the main diagonal. 
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Schur's Theorem 


If A is an n x n matrix with real entries and real eigenvalues , then there is an orthogonal 
matrix P such that P T AP is an upper triangular matrix of the form 




X 

X 

X 


0 

^2 

X 

X 

p t ap = 

0 

0 

^3 * 

X 


(11) 


_0 0 0 ••• X n _ 

in which X\, Xi, . . . , X n are the eigenvalues of A repeated according to multiplicity. 



First subdiagonal 


▲ Figure 7.2.2 


It is common to denote the upper triangular matrix in ( 1 1 ) by S (for Schur), in which 
case that equation would be rewritten as 

A = PSP T (12) 

which is called a Schur decomposition of A . 

The next theorem, due to the German electrical engineer Karl Hessenberg ( 1 904— 
1959), states that every square matrix with real entries is orthogonally similar to a matrix 
in which each entry below the first suhdiagonal is zero (Figure 7.2.2). Such a matrix is 
said to be in upper Hessenberg form. 


Hessenberg's Theorem 

If A is an n x n matrix with real entries , then there is an orthogonal matrix P such that 
P t AP is a matrix of the form 




X 

X 

X 

X 

X 



X 

X 

X 

X 

X 


p t ap = 

0 

X 

X 

X 

X 

Note that unlike those in (1 1), 
the diagonal entries in (13) 


0 

0 

X 

X 

X 

are usually not the eigenvalues 


0 

0 

... 0 

X 

X 


of A. 


It is common to denote the upper Hessenberg matrix in (13) by H (for Hessenberg), 
in which case that equation can be rewritten as 

A = PHP t (14) 

which is called an upper Hessenberg decomposition of A . 



Issai Schur 
(1875-1941) 


The life of the German mathematician Issai Schur is a sad reminder 
of the effect that Nazi policies had on Jewish intellectuals during the 1930s. Schur 
was a brilliant mathematician and a popular lecturer who attracted many students 
and researchers to the University of Berlin, where he worked and taught. His lectures 
sometimes attracted so many students that opera glasses were needed to see him 
from the back row. Schur's life became increasingly difficult under Nazi rule, and in 
April of 1933 he was forced to "retire" from the university under a law that prohibited 
non-Aryans from holding "civil service" positions. There was an outcry from many 
of his students and colleagues who respected and liked him, but it did not stave off 
his complete dismissal in 1935. Schur, who thought of himself as a loyal German, 
never understood the persecution and humiliation he received at Nazi hands. He left 
Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had 
to sell his beloved mathematics books and lived in poverty until his death in 1941. 

[Image: Courtesy Electronic Publishing Services, Inc., NewYork City] 
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Remark In many numerical algorithms the initial matrix is first converted to upper Hessenberg 
form to reduce the amount of computation in subsequent parts of the algorithm. Many computer 
packages have built-in commands for finding Schur and Hessenberg decompositions. 


Exercise Set 7.2 


In Exercises 1-6, find the characteristic equation of the given 
symmetric matrix, and then by inspection determine the dimen- 


sions of the eigenspaces. 














1 


-4 

2 



'1 

2' 










1. 

_2 

4 




2. 

-4 


1 

-2 









2 


-2 

-2 



"l 

1 

l" 




'4 

2 

2 



3. 

1 

1 

1 



4. 

2 

4 

2 




1 

1 

1 




2 

2 

4 




'4 

4 

0 

o' 



2 


-1 

0 

0 


4 

4 

0 

0 



-1 


2 

0 

0 

5. 

0 

0 

0 

0 


6. 

0 


0 

2 

-1 


0 

0 

0 

0 



0 


0 

-1 

2 


In Exercises 4, find a matrix P that orthogonally diagonal- 
izes A, and determine P~ l AP. 


6 

2sfi>~ 


2^3 

1 


' -2 

0 

-36 

0 

-3 

0 

-36 

0 

-23 


8. A = 


r 

3 


10. A = 


6 

-2 


- 2 ' 

3 



2 

-1 

-l" 




"l 

1 

o' 


11. A = 

-1 

2 

-1 


12. 

A = 

1 

1 

0 



-1 

-1 

2 




0 

0 

0 



'-7 

24 

0 

o' 



'3 

1 

0 

o' 


24 

7 

0 

0 



1 

3 

0 

0 

II 

0 

0 

-7 

24 

14. 

A = 

0 

0 

0 

0 


0 

0 

24 

7 



0 

0 

0 

0 


In Exercises 19-20, determine whether there exists a 3 x 3 sym- 
metric matrix whose eigenvalues are A.i = — 1, X 2 = 3, = 7 and 

for which the corresponding eigenvectors are as stated. If there is 
such a matrix, find it, and if there is none, explain why not. 



o' 


T 


'o' 

19. X! = 

1 

. x 2 = 

0 

. x 3 = 

1 


-1 


0 


1 


o' 


"l" 


T 

© 

* 

II 

1 

. x 2 = 

0 

. x 3 = 

l 


-1 


0 


l 


21. Let A be a diagonalizable matrix with the property that eigen- 
vectors corresponding to distinct eigenvalues are orthogonal. 
Must A be symmetric? Explain your reasoning. 


22. Assuming that b ^ 0, find a matrix that orthogonally diago- 
nalizes 

"a b~ 
b a_ 

23. Let T A . R 2 ^ R 2 be multiplication by A. Find two orthog- 
onal unit vectors U! and u 2 such that T a (vl\) and 7a (u 2 ) are 
orthogonal. 


(a) A = 


1 

1 


(b) A = 


2 

1 


24. Let T a : R 3 — »■ R 3 be multiplication by A. Find two orthog- 
onal unit vectors U! and u 2 such that T A (ui ) and T A (u 2 ) are 
orthogonal. 


'4 

2 

2 


"l 

0 

o' 

2 

4 

2 

(b) A = 

0 

1 

1 

2 

2 

4 


0 

1 

1 


Working with Proofs 

25. Prove that if A is any m x n matrix, then A T A has an ortho- 
normal set of n eigenvectors. 


In Exercises 5-18, find the spectral decomposition of the 
matrix. 


15. 

"3 f 
_1 3_ 



16. 

6 

-2 

-2" 

3_ 



'-3 

1 

2 


' -2 

0 

-36' 

17. 

1 

-3 

2 

18. 

0 

-3 

0 


2 

2 

0 


-36 

0 

-23 


26. Prove: If {ui, u 2 , . . . , u„) is an orthonormal basis for R" , and 
if A can be expressed as 

A = c^uf + c,u 2 u 2 r + • • • + c„u„uj 
then A is symmetric and has eigenvalues ci, c 2 , . . . , c„. 

27. Use the result in Exercise 29 of Section 5.1 to prove Theo- 
rem 7.2.2(a) for 2 x 2 symmetric matrices. 

28. (a) Prove that if v is any n x 1 matrix and I is the n x n iden- 

tity matrix, then / — vv 7 is orthogonally diagonalizable. 
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(b) Find a matrix P that orthogonally diagonalizes / — vv r if 


T 

0 

1 


29. Prove that if A is a symmetric orthogonal matrix, then 1 and 
— 1 are the only possible eigenvalues. 

30. Is the converse of Theorem 1 .2.2(b) true? Justify your answer. 


31. In this exercise we will show that a symmetric matrix A is 
orthogonally diagonalizable, thereby completing the missing 
part of Theorem 7.2.1. We will proceed in two steps: first we 
will show that A is diagonalizable, and then we will build on 
that result to show that A is orthogonally diagonalizable. 

(a) Assume that A is a symmetric n x n matrix. One way 
to prove that A is diagonalizable is to show that for each 
eigenvalue Ao the geometric multiplicity is equal to the 
algebraic multiplicity. For this purpose, assume that the 
geometric multiplicity of A 0 is k, let B 0 = {ui, u 2 , . . . , u*,} 
be an orthonormal basis for the eigenspace correspond- 
ing to the eigenvalue A 0 , extend this to an orthonormal 
basis B 0 = (ui, u 2 , . . . , u„) for R" , and let P be the ma- 
trix having the vectors of B as columns. As shown in Ex- 
ercise 40(b) of Section 5.2, the product AP can be written 
as 

"a 0 I k X " 

AP = P 

0 Y 

Use the fact that B is an orthonormal basis to prove that 
X = 0 [a zero matrix of size n x (n — k)]. 

(b) It follows from part (a) and Exercise 40(c) of Section 5.2 
that A has the same characteristic polynomial as 


C = P 


Ao Ik 
0 


0 

Y 


Use this fact and Exercise 40(d) of Section 5.2 to prove that 
the algebraic multiplicity of A 0 is the same as the geometric 
multiplicity of Ao . This establishes that A is diagonalizable. 


(c) Use Theorem 1.2.2(b) and the fact that A is diagonalizable 
to prove that A is orthogonally diagonalizable. 


True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 

false, and justify your answer. 

(a) If A is a square matrix, then AA r and A T A are orthogonally 
diagonalizable. 

(b) If Vi and v 2 are eigenvectors from distinct eigenspaces of a 
symmetric matrix with real entries, then 

IKl +V 2 || 2 = || V! || 2 + ||V 2 || 2 

(c) Every orthogonal matrix is orthogonally diagonalizable. 

(d) If A is both invertible and orthogonally diagonalizable, then 
A~* is orthogonally diagonalizable. 

(e) Every eigenvalue of an orthogonal matrix has absolute value 1 . 

(f ) If A is an n x n orthogonally diagonalizable matrix, then there 
exists an orthonormal basis for R" consisting of eigenvectors 
of A. 

(g) If A is orthogonally diagonalizable, then A has real eigen- 
values. 


Working with Technology 

Tl. If your technology utility has an orthogonal diagonalization 
capability, use it to confirm the final result obtained in Example 1 . 


T2. For the given matrix A, find orthonormal bases for the 
eigenspaces of A, and use those basis vectors to construct an or- 
thogonal matrix P for which P T AP is diagonal. 


A = 


-4 

2 

-2 


2 

-7 

4 


-2 

4 

-7 


T3. Find a spectral decomposition of the matrix A in Exercise T2. 


7.3 Quadratic Forms 

In this section we will use matrix methods to study real-valued functions of several 
variables in which each term is either the square of a variable or the product of two 
variables. Such functions arise in a variety of applications, including geometry, vibrations 
of mechanical systems, statistics, and electrical engineering. 


Definition of a Quadratic Expressions of the form 

a 1 X 1 + 0 . 1 X 2 + ■ • • + a n x„ 

occurred in our study of linear equations and linear systems. If ai,a 2 , , a„ are 
treated as fixed constants, then this expression is a real-valued function of the n variables 
x \ . X 2 , .... x„ and is called a linear form on R" . All variables in a linear form occur to 
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the first power and there are no products of variables. Here we will be concerned with 
quadratic forms on R" , which are functions of the form 

a x x\ + a 2 x 2 H + a n x 2 + (all possible terms a k xiXj in which i f j) 

The terms of the form a k XjXj are called cross product terms. It is common to combine 
the cross product terms involving x,-x 2 with those involving xjx t to avoid duplication. 
Thus, a general quadratic form on R 2 would typically be expressed as 

a x x\ + a 2 x 2 + 2a 3 XjX 2 (1) 

and a general quadratic form on R 3 as 

ajXj + n 2 x 2 + a 3 x 3 + 2a 4 x l x 2 + 2a 5 Jt 1 x 3 + 2 « 6 x 2 x 3 (2) 


If, as usual, we do not distinguish between the number a and the 1 x 1 matrix [a], and 
if we let x be the column vector of variables, then (1) and (2) can be expressed in matrix 
form as 


[ x i 

[*t 


a 1 

<23 

X\ 

= X 

t Ax 

_«3 

<22 

.*2. 




~Cl\ 

«4 

af 


~Xi~ 

x 3 ] 

CI4 

fib 

a 6 


x 2 


j as 

a 6 

Ch,_ 


.X3_ 


(verify). Note that the matrix A in these formulas is symmetric, that its diagonal entries 
are the coefficients of the squared terms, and its off-diagonal entries are half the coeffi- 
cients of the cross product terms. In general, if A is a symmetric n x n matrix and x is 
an n x 1 column vector of variables, then we call the function 


0a(x) = x r Ax 


(3) 


the quadratic form associated with A. When convenient, (3) can be expressed in dot 
product notation as 

x T Ax = x • Ax = Ax • x (4) 


In the case where A is a diagonal matrix, the quadratic form x r Ax has no cross 
product terms; for example, if A has diagonal entries A.i, X 2 , . . . , then 



^•1 

0 • 

■ 0 


Xl 

X T Ax = [Xl Xl ■■■ x„ ] 

0 

X.2 ■ 

• 0 


x 2 


0 

0 ■ 



x n 


— X l x l + X 2 x 2 + • • • + A. n x 2 


> EXAMPLE 1 Expressing Quadratic Forms in Matrix Notation 

In each part, express the quadratic form in the matrix notation x r Ax, where A is sym- 
metric. 

(a) 2x 2 + 6 xy — 5y 2 (b) x\ + lx\ — 3xf + 4 xjX 2 — 2x 3 x 3 + 8x 2 x 2 


Solution The diagonal entries of A are the coefficients of the squared terms, and the 
off-diagonal entries are half the coefficients of the cross product terms, so 

2x 2 + 6xy — 5y 2 = [x y] 



x 2 + lx\ — 3x 2 + 4.XjX 2 — 2xj.x 3 + 8x^x 3 = [xq x 2 x 3 ] 

‘ 1 

2 

2 -r 

7 4 

1 

-h rq 

K H 

1 


_-l 

4 — 3_ 

_* 3 _ 
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Change of Variable in a 
Quadratic Form 


There are three important kinds of problems that occur in applications of quadratic 
forms: 


Problem 1 If x r Ax is a quadratic form on R 2 or R 3 , what kind of curve or surface is 
represented by the equation x’Ax = k ? 

If x r Ax is a quadratic form on R n , what conditions must A satisfy for 
x t Ax to have positive values for x ^ 0? 

Problem 3 If x T Ax is a quadratic form on R" , what are its maximum and minimum 
values if x is constrained to satisfy ||x|| = 1? 


We will consider the first two problems in this section and the third problem in the next 
section. 

Many of the techniques for solving these problems are based on simplifying the 
quadratic form x r Ax by making a substitution 

x = Py (5) 

that expresses the variables x\ , x 2 , . . . , x n in terms of new variables yi , y 2 , ■ ■ ■ , y n ■ If P 
is invertible, then we call (5) a change of variable , and if P is orthogonal, then we call (5) 
an orthogonal change of variable. 

If we make the change of variable x = Py in the quadratic form x T Ax, then we obtain 

x t Ax = (PyfA(Py) = y r P r APy = y\P T AP)y (6) 

Since the matrix B = P T AP is symmetric (verify), the effect of the change of variable is 
to produce a new quadratic form y T By in the variables y\,yi, . . . , y„. In particular, if 
we choose P to orthogonally diagonalize A, then the new quadratic form will be y T Dy, 
where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is, 


x t Ax = y T Dy = [y, y 2 ■■■ y n ] 

ki 0 ■ 

0 A 2 • 

• 0 

• 0 


1 

— < (N 

1 


0 0 ■ 



_- v »_ 


= Vw + Ky\ + ■ ■ ■ + Kyi 

Thus, we have the following result, called the principal axes theorem. 


The Principal Axes Theorem 

If A is a symmetric n x n matrix, then there is an orthogonal change of variable that 
transforms the quadratic form x r Ax into a quadratic form y T Dy with no cross product 
terms. Specifically, if P orthogonally diagonalizes A , then making the change of variable 
x — Py in the quadratic form x r Ax yields the quadratic form 

x t Ax = y T Dy = X x y\ + Kji + ' ' ' + X n y; t 

in which Ai, Xi, . . . , A„ are the eigenvalues of A corresponding to the eigenvectors that 
form the successive columns of P. 
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Quadratic Forms in 
Geometry 


► EXAMPLE 2 An Illustration of the Principal Axes Theorem 

Find an orthogonal change of variable that eliminates the cross product terms in the 
quadratic form Q = x\ — x\ — \x\x 2 + 4x 2 x 2 , and express Q in terms of the new vari- 
ables. 


Solution The quadratic form can be expressed in matrix notation as 



1-2 o" 


X\ 

Q — x T Ax = [xi x 2 * 3 ] 

-2 0 2 


x 2 


0 2-1 


x 3 


The characteristic equation of the matrix A is 


1-1 2 

2 1 

0 -2 


0 

-2 


= l 3 — 91 = 1(1 + 3)(1 - 3) = 0 


1+1 


so the eigenvalues are 1 = 0, —3,3. We leave it for you to show that orthonormal bases 
for the three eigenspaces are 



-2- 

3 


1 

— c|m 

1 

1 


r 2 _ 

3 

II 

© 

1 

3 

2 

1—3—1 

>- 

II 

1 

2 

3 

2 

L 3-1 

rn 

II 

2 

3 

1 

L 3 -1 


Thus, a substitution x = Py that eliminates the cross product terms is 

ri l 2i 


~xf 


3 3 3 


"yf 



1 2 2 



X 2 

— 

3 3 3 


y2 

-X3_ 


2 2 J_ 


.J3_ 


L 3 3 3 J 

This produces the new quadratic form 


Q = y T (P T AP)y = [ Vi y 2 y 3 ] 

‘0 

0 

0 

-3 

0~ 

0 

"yf 

y2 


_0 

0 

3_ 

_y 3 _ 


= -3yf + 3y 3 


in which there are no cross product terms. M 


Remark If A is a symmetric n x n matrix, then the quadratic form \ T Ax is a real-valued function 
whose range is the set of all possible values for x T Ax as x varies over R" . It can be shown that an 
orthogonal change of variable x = Py does not alter the range of a quadratic form; that is, the 
set of all values for x T Ax as x varies over R" is the same as the set of all values for y T (P T AP)y as y 
varies over R" . 


Recall that a conic section or conic is a curve that results by cutting a double-napped cone 
with a plane (Figure 7.3.1). The most important conic sections are ellipses, hyperbolas, 
and parabolas, which result when the cutting plane does not pass through the vertex. 
Circles are special cases of ellipses that result when the cutting plane is perpendicular to 
the axis of symmetry of the cone. If the cutting plane passes through the vertex, then the 
resulting intersection is called a degenerate conic. The possibilities are a point, a pair of 
intersecting lines, or a single line. 

Quadratic forms in R 2 arise naturally in the study of conic sections. For example, it 
is shown in analytic geometry that an equation of the form 

ax 2 + 2 bxy + cy 2 + dx + ey + / = 0 


(7) 
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► Figure 7.3.1 



Circle 


Ellipse 


Parabola 


Hyperbola 


in which a, b, and c are not all zero, represents a conic section. If d — e = 0 in (7), 
then there are no linear terms, so the equation becomes 

ax 2 + 2 bxy + cy 2 + f = 0 (8) 

and is said to represent a central conic. These include circles, ellipses, and hyperbolas, 
but not parabolas. Furthermore, if b — 0 in (8), then there is no cross product term (i.e., 
term involving xy), and the equation 

ax 2 + cy 2 + / = 0 (9) 

is said to represent a central conic in standard position. The most important conics of 
this type are shown in Table 1 . 


Table 1 



If we take the constant / in Equations (8) and (9) to the right side and let k = — f, 
then we can rewrite these equations in matrix form as 


[ x y] 


a b 

X 

b c_ 

_y_ 


= k and [x y] 


a 0 


1 

o 

_;y_ 


= k 


( 10 ) 


We must also allow for the possibility that there are no real values of x and y that satisfy the equation, as 
with x 2 + y 2 + 1 = 0. In such cases we say that the equation has no graph or has an empty graph. 
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A central conic 
rotated out of 
standard position 


▲ Figure 7.3.2 

Identifying Conic Sections 


,,y 

3 



A Figure 7.3.3 


The first of these corresponds to Equation (8) in which there is a cross product term 
2 bxy, and the second corresponds to Equation (9) in which there is no cross product 
term. Geometrically, the existence of a cross product term signals that the graph of the 
quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional 
analogs of the equations in (10) are 



a d e 


X 


o 

0 

Q 

1 


X 

[x y z] 

d b f 


y 

= k and [x y z\ 

o 

<5- 

O 


y 


1 

1 


z 


0 0c 


z 


(ID 


If a, b, and c are not all zero, then the graphs in R 3 of the equations in (1 1 ) are called 
central quadrics', the graph of the second of these equations, which is a special case of 
the first, is called a central quadric in standard position. 


We are now ready to consider the first of the three problems posed earlier, identifying 
the curve or surface represented by an equation x r Ax — k in two or three variables. We 
will focus on the two-variable case. We noted above that an equation of the form 

ax 2 + 2 bxy + cy 2 + f = 0 (12) 

represents a central conic. Ifb = 0, then the conic is in standard position, and if b ^ 0, it 
is rotated. It is an easy matter to identify central conics in standard position by matching 
the equation with one of the standard forms. For example, the equation 

9x 2 + 16 y 2 - 144 = 0 


can be rewritten as 


2 2 

^ + ^ = i 
16 9 


which, by comparison with Table 1, is the ellipse shown in Figure 7.3.3. 

If a central conic is rotated out of standard position, then it can be identified by 
first rotating the coordinate axes to put it in standard position and then matching the 
resulting equation with one of the standard forms in Table 1. To find a rotation that 
eliminates the cross product term in the equation 

ax 2 + 2 bxy + cy 2 = k (13) 

it will be convenient to express the equation in the matrix form 


x'Ax = [x 

and look for a change of variable 


y] 


a b 


X 

b c 


_y_ 


= k 


(14) 


x = Px 1 


that diagonalizes A and for which det(F) = 1 . Since we saw in Example 4 of Section 7. 1 
that the transition matrix 


P = 


cos 9 
sin0 


— sin 0 
cos 6 


(15) 


has the effect of rotating the xy-axes of a rectangular coordinate system through an angle 
0, our problem reduces to finding 0 that diagonalizes A, thereby eliminating the cross 
product term in (13). If we make this change of variable, then in the ^'/-coordinate 
system, Equation (14) will become 


x' t Dx! = [x' y'] 


X\ 

0 “ 

V 

_0 

A.2_ 

y. 


— k 


(16) 


where / and X 2 are the eigenvalues of A. The conic can now be identified by writing 
(16) in the form 


X\x r t x 2 y f2 — k 


(17) 
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and performing the necessary algebra to match it with one of the standard forms in 
Table 1. For example, if X 2 , and k are positive, then (17) represents an ellipse with 
an axis of length 2*JkJX\ in the x'-direction and 2^/k/Xi in the /-direction. The first 
column vector of P, which is a unit eigenvector corresponding to / , is along the positive 
x'-axis; and the second column vector of P, which is a unit eigenvector corresponding 
to X 2 , is a unit vector along the /-axis. These are called the principal axes of the 
ellipse, which explains why Theorem 7.3.1 is called “the principal axes theorem.” (See 
Figure 7.3.4.) 



> EXAMPLE 3 Identifying a Conic by Eliminating the Cross Product Term 

(a) Identify the conic whose equation is 5x 2 — 4xy + 8y 2 — 36 = 0 by rotating the 
xy-axes to put the conic in standard position. 

(b) Find the angle 6 through which you rotated the xy-axes in part (a). 


Solution (a) The given equation can be written in the matrix form 

x r Ax = 36 


where 



The characteristic polynomial of A is 


X — 5 2 

2 X- 


(X - 4) (a - 9) 


Had it turned out that 
det(P) = — 1, then we would 
have interchanged the col- 
umns to reverse the sign. 


so the eigenvalues are X = 4 and X = 9. We leave it for you to show that orthonormal 
bases for the eigenspaces are 


2 


1 

V5 

, X = 9: 

V5 

1 

2 

_V5_ 


V5_ 


Thus, A is orthogonally diagonalized by 


2 



V5 



( 18 ) 


Moreover, it happens by chance that det(P) = 1 , so we are assured that the substitution 
x = Px! performs a rotation of axes. It follows from (16) that the equation of the conic 
in the x'y'-coordinate system is 


'4 O' 

"x r 

.0 9_ 

^ 1 


[x' /] 


= 36 
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which we can write as 

a a 

4x' 2 + 9y' 2 = 36 or — + — = 1 
9 4 

We can now see from Table 1 that the conic is an ellipse whose axis has length 2a = 6 in 
the x'-direction and length 2/1 = 4 in the y-direction. 



Solution (h) It follows from (15) that 

~ 2 1 _" 

V5 V5 


P = 


1 

V5 


V5 


cos 6 — sin 9 

sin 6 cos 6 


which implies that 


Thus, 9 = tan 


-i i 


2 1 sin 0 1 

cos 0 = —=, sin0 = tan 0 = = - 

y/5 V5 cos 0 2 

26.6° (Figure 7.3.5). ◄ 


Remark In the exercises we will ask you to show that if b 0, then the cross product term in 
the equation 

ax 1 + 2 bxy + cy 1 = k 

can be eliminated by a rotation through an angle 9 that satisfies 

a — c 

cot 29 = (19) 

2b 

We leave it for you to confirm that this is consistent with part (b) of the last example. 


Positive Definite Quadratic 
Forms 


We will now consider the second of the two problems posed earlier, determining con- 
ditions under which x r Ax > 0 for all nonzero values of x. We will explain why this is 
important shortly, but first we introduce some terminology. 


The terminology in Definition 
1 also applies to the matrix A; 
that is, A is positive definite, 
negative definite, or indefinite 
in accordance with whether 
the associated quadratic form 
has that property. 


DEFINITION 1 A quadratic form x r Ax is said to be 
positive definite if x r Ax > 0 for x f (I; 
negative definite if x r A x < 0 for x y 0 ; 
indefinite if x r Ax has both positive and negative values. 


The following theorem, whose proof is deferred to the end of the section, provides a 
way of using eigenvalues to determine whether a matrix A and its associated quadratic 
form x r Ax are positive definite, negative definite, or indefinite. 


THEOREM 7.3.2 If A is a symmetric matrix, then : 

(a) x r Ax is positive definite if and only if all eigenvalues of A are positive. 

( b ) x r Ax is negative definite if and only if all eigenvalues of A are negative. 

(c) x r Ax is indefinite if and only if A has at least one positive eigenvalue and at least 
one negative eigenvalue. 


Remark The three classifications in Definition 1 do not exhaust all of the possibilities. For 
example, a quadratic form for which x T Ax > 0 if x f 0 is called positive semidefinite, and one for 
which x T Ax < 0 if x f 0 is called negative semidefinite. Every positive definite form is positive 
semidefinite, but not conversely, and every negative definite form is negative semidefinite, but not 
conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that 
xfAx is positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative 
semidefinite if and only if all eigenvalues of A are nonpositive. 
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► EXAMPLE 4 Positive Definite Quadratic Forms 


It is not usually possible to tell from the signs of the entries in a symmetric matrix A 
whether that matrix is positive definite, negative definite, or indefinite. For example, the 
entries of the matrix 


A = 


"3 

1 

1 


1 1 ~ 

0 2 
2 0 


are nonnegative, but the matrix is indefinite since its eigenvalues are X — 1 , 4, — 2 (verify). 
To see this another way, let us write out the quadratic form as 




‘3 

1 

r 

~xf 



Positive definite and negative 

x t Ax = [x\ x 2 X3] 

1 

0 

2 

X 2 

= 3 x 2 + 2xix 2 + 2x1X3 + 4x 2 X3 

definite matrices are invertible. 


_1 

2 

0_ 

_A 3 _ 



Why? 

We can now see, for example, that 






x t Ax — 

4 

for 

Xi 

= 0, 

x 2 — 1 , 

x 3 = 1 


and 








x t Ax = - 

-4 

for 

Xl 

= 0, 

x 2 — 1 , 

X3 = — 1 


Classifying Conic Sections 
Using Eigenvalues 


If x r Bx = k is the equation of a conic, and if k 0, then we can divide through by k 
and rewrite the equation in the form 


x T Ax = 1 


( 20 ) 



where A = ( \/k)B. If we now rotate the coordinate axes to eliminate the cross product 
term (if any) in this equation, then the equation of the conic in the new coordinate system 
will be of the form 

X lX ' 2 + X 2 y' 2 = \ (21) 

in which A| and X 2 are the eigenvalues of A. The particular type of conic represented by 
this equation will depend on the signs of the eigenvalues A.i and X 2 . For example, you 
should be able to see from (21) that: 

x t Ax — 1 represents an ellipse if Ai >0 and X 2 > 0. 

x t Ax — 1 has no graph if Ai <0 and X 2 < 0. 

x t Ax — 1 represents a hyperbola if A| and X 2 have opposite signs. 

In the case of the ellipse, Equation (21) can be rewritten as 

x' 2 y' 2 

= 1 (22) 

(1 Is/xlY (l/V^) 2 

so the axes of the ellipse have lengths 2 /s/X~\ and l/sfff (Figure 7.3.6). 

The following theorem is an immediate consequence of this discussion and Theorem 
7.3.2. 


If A is a symmetric 2x2 matrix , them. 

( a ) x r Ax = 1 represents an ellipse if A is positive definite. 

(b) x t Ax = 1 has no graph if A is negative definite. 

(c) x t Ax = 1 represents a hyperbola if A is indefinite. 
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In Example 3 we performed a rotation to show that the equation 


5x 2 — 4 xy + 8v 2 — 36 = 0 


represents an ellipse with a major axis of length 6 and a minor axis of length 4. This 
conclusion can also be obtained by rewriting the equation in the form 

^x 2 - ±xy+ |y 2 = 1 

and showing that the associated matrix 

_5_ _ J_ _ 

. 36 18 

A = 

__L 2 

18 9 


has eigenvalues A| = | and X 2 = These eigenvalues are positive, so the matrix A is 
positive definite and the equation represents an ellipse. Moreover, it follows from (21) 
that the axes of the ellipse have lengths 2/yfX[ = 6 and 2/^/Jf = 4, which is consistent 
with Example 3. 


Identifying Positive Definite Positive definite matrices are the most important symmetric matrices in applications, so 
Matrices it will be useful to learn a little more about them. We already know that a symmetric 
matrix is positive definite if and only if its eigenvalues are all positive; now we will give 
a criterion that can be used to determine whether a symmetric matrix is positive definite 
without finding the eigenvalues. For this purpose we define the />th principal submatrix of 
an n x n matrix A to be the k x k submatrix consisting of the first k rows and columns 
of A. For example, here are the principal submatrices of a general 4x4 matrix: 


a 11 ; 

an 

C\3 

a 14 

a 2 \ 

022 

023 

024 

031 

O 32 

033 

034 

O 4 I 

«42 

O 43 

#44 


First principal submatrix 


o 11 

O 12 | 

0 13 

0 14 

021 

O 22 j 

#23 

024 

031 

032 

033 

O 34 

041 

O 42 

043 

#44 


Second principal submatrix 


All 

O 12 

013 j 

014 

02\ 

022 

023 ! 

024 

031 

O 32 

033 | 

O 34 

04 1 

O 42 

O 43 

O 44 


Third principal submatrix 


Oil 

012 

013 

«14 

021 

022 

023 

024 

O 31 

O 32 

033 

O 34 

041 

O 42 

O 43 

#44 


Fourth principal submatrix = A 


The following theorem, which we state without proof, provides a determinant test 
for ascertaining whether a symmetric matrix is positive definite. 


If A is a symmetric matrix, them. 

(a) A is positive definite if and only if the determinant of every principal submatrix is 
positive. 

(b) A is negative definite if and only if the determinants of the principal submatrices 
alternate between negative and positive values starting with a negative value for the 
determinant of the first principal submatrix. 

(c) A is indefinite if and only if it is neither positive definite nor negative definite and 
at least one principal submatrix has a positive determinant and at least one has a 
negative determinant. 
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► EXAMPLE 5 Working with Principal Submatrices 


The matrix 



-3" 

4 

9 


is positive definite since the determinants 


|2| =2, 



2 -1 
-1 2 
-3 4 


-3 

4 

9 


= 1 


are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and 
x T Ax > 0 for x ^ 0. 


OPTIONAL 


We conclude this section with an optional proof of Theorem 7.3.2. 

Proofs of Theorem 7.3.2(a) and ( b ) It follows from the principal axes theorem (Theo- 
rem 7.3.1) that there is an orthogonal change of variable x = Py for which 

x t Ax = y T Dy = k x y\ + X 2 y\ + • • • + X n y 2 n (23) 

where the 77s are the eigenvalues of A. Moreover, it follows from the invertibility of P 
that y ^ 0 if and only if x ^ 0, so the values of x r Ax for x ^ 0 are the same as the values 
of y T Dy for y ^ 0. Thus, it follows from (23) that x r Ax > 0 for x ^ 0 if and only if all 
of the 77s in that equation are positive, and that x r Ax < 0 for x ^ 0 if and only if all of 
the 77s are negative. This proves parts (a) and ( b ). 

Proof (c) Assume that A has at least one positive eigenvalue and at least one negative 
eigenvalue, and to be specific, suppose that Xi > 0 and X 2 < 0 in (23). Then 

x t Ax >0 if >’| = I and all other y’s are 0 


and 


x t Ax < 0 if y 2 — 1 and all other y’s are 0 


which proves that x T Ax is indefinite. Conversely, if x r Ax > 0 for some x, then y T Dy > 0 
for some y, so at least one of the 77s in (23) must be positive. Similarly, if x r Ax < 0 for 
some x, then y T Dy < 0 for some y, so at least one of the 77s in (23) must be negative, 
which completes the proof. 


Exercise Set 7.3 

In Exercises 1-2, express the quadratic form in the matrix no- 
tation x r Ax, where A is a symmetric matrix. 

1. (a) 3x^ + lx\ (b) 4xf — 9x\ — 6 x 1 X 2 

(c) 9 x'j — x\ + 4x\ + 6x1X2 — 8x1X3 + X2X3 

2 . (a) 5x'j + 5 xiX 2 (b) — 7 xiX 2 

(c) x\ 4- xf — 3xf — 5 x]X 2 + 9 xiX 3 

In Exercises -4, find a formula for the quadratic form that 
does not use matrices. 


2 -3" 


.-3 5. 



-2 1 1 


r 

2 


Xl 

\ 0 6 


X 2 

1 6 3 


X3 


In Exercises 5-8, find an orthogonal change of variables that 
eliminates the cross product terms in the quadratic form Q , and 
express Q in terms of the new variables. 

5. Q = 2xj + 2xf — 2 x 1 X 2 

6. Q = 5xj + 2xf + 4xf + 4 .X 1 X 2 

7. Q = 3xj + 4xf + 5xj + 4xiX2 — 4x 2 X3 

8. Q = 2x\ + 5xf + 5xj + 4 xiX 2 — 4xiX3 — 8 x 2 X 3 


3. [x y] 
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In Exercises , express the quadratic equation in the matrix 
formx r Ax + Kx + / = 0, where x T Ax is the associated quadratic 
form and K is an appropriate matrix. 

9. (a) 2x 2 +xy + x — 6y + 2 = 0 
(b) y 2 + 7x — 8 y — 5 = 0 

10. (a) x 2 — xy + 5x + 8 y — 3 = 0 

(b) 5xy = 8 

In Exercises 12, identify the conic section represented by 
the equation. 

11 . (a) 2 x 2 + 5y 2 = 20 (b) x 2 - v 2 - 8 = 0 

(c) 7y 2 - 2x = 0 (d) x 2 + v 2 - 25 = 0 


12. (a) 4x 2 + 9y 2 = 1 (b) 4x 2 - 5y 2 = 20 

(c) -x 2 = 2y (d) x 2 — 3 = — y 2 

In Exercises -16, identify the conic section represented by 
the equation by rotating axes to place the conic in standard po- 
sition. Find an equation of the conic in the rotated coordinates, 
and find the angle of rotation. 


13. 2x 2 — 4xy — y 2 + 8 = 0 14. 5x 2 + 4xy + 5y 2 = 9 

15. 1 lx 2 + 24xy + 4y 2 - 15 = 0 16. x 2 + xy + y 2 = A 


In Exercises ! -18, determine by inspection whether the matrix 
is positive definite, negative definite, indefinite, positive semidefi- 
nite, or negative semidefinite. 


17. (a) 


(d) 


1 

0 

1 

0 


0 " 

2 _ 

0 " 

0 


(b) 

(e) 


'-1 0 " 
0 — 2 _ 

"0 0 " 

0 -2 



O' 

2 


18. (a) 


(d) 


'2 

0 

0 

0 


O' 

— 5_ 

O' 

-5 




O' 

0 



O' 

5 


In Exercises 9-24, classify the quadratic form as positive defi- 
nite, negative definite, indefinite, positive semidefinite, or negative 
semidefinite. 


19. x 2 + x 2 20. — x 2 — 3x 2 21. (xi — X 2) 2 

22. — (xi — X 2) 2 23. x 2 — x 2 24. X 1 X 2 

In Exercises 25-26, show that the matrix A is positive definite 
first by using Theorem 7.3.2 and then by using Theorem 7.3.4. 





2 

-1 

(f 

25. (a) A = 

' 5 -2' 

—2 5_ 

(b) A = 

-1 

2 

0 




0 

0 

5 



'2 

f 


3 -1 

(f 

26. (a) A = 

_1 

2 _ 

(b) A = 

-1 2 

0 -1 

-1 

3 


In Exercises 27-28, use Theorem 7.3.4 to classify the matrix as 
positive definite, negative definite, or indefinite. 



"3 

1 

2 



'-3 

2 

(f 

27. (a) A = 

1 

-1 

3 


(b) A = 

2 

-3 

0 


2 

3 

2 



0 

0 

-5 


4 

1 


l" 


"-4 

-1 

l" 

28. (a) A = 

1 

2 

1 

(b) A = 

-1 

-2 

-1 


-1 

1 


2 


1 

-1 

-2 


In Exercises 29-30, find all values of k for which the quadratic 
form is positive definite. 

29. 5x 2 + x\ + kx\ + 4 xiX 2 — 2 x 1 X 3 — 2 .X 2 X 3 

30. 3xf + xf + 2x| — 2x'ix 3 + 2kx 2 x 2 

31. Let x t Ax be a quadratic form in the variables Xi, X 2 , . . . , x„, 
and define T : R" — > R by T(x) = x T Ax. 

(a) Show that T(x + y) = T(x) + 2x T Ay + T( y). 

(b) Show that T(cx ) = c 2 T(x). 

32. Express the quadratic form (C 1 X 1 + c 2 X 2 + • • • + c„x „) 2 in the 
matrix notation x T Ax, where A is symmetric. 

33. In statistics, the quantities 

1 

x = -(xi + x 2 -I 1 - x„) 

n 

and 

si = — 1 — - [(Xi - x ) 2 + (x 2 - x ) 2 4 h (x„ - x) 2 ] 

n — 1 L J 

are called, respectively, the sample mean and sample variance 
of X = (Xi, x 2 , . . . , x„). 

(a) Express the quadratic form j 2 in the matrix notation x r Ax, 
where A is symmetric. 

(b) Is s 2 a positive definite quadratic form? Explain. 

34. The graph in an xvz-coordinate system of an equation of form 
ax 2 + by 2 + cz 2 = 1 in which o, b, and c are positive is a 
surface called a central ellipsoid in standard position (see the 
accompanying figure). This is the three-dimensional gener- 
alization of the ellipse ax 2 + by 2 = 1 in the xy-plane. The 
intersections of the ellipsoid ax 2 + by 2 + cz 2 = 1 with the co- 
ordinate axes determine three line segments called the axes of 
the ellipsoid. If a central ellipsoid is rotated about the origin 
so two or more of its axes do not coincide with any of the 
coordinate axes, then the resulting equation will have one or 
more cross product terms. 
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(a) Show that the equation 

jx 2 + | y 2 + jz 2 + \xy + jxz + f yz = 1 

represents an ellipsoid, and find the lengths of its axes. 
[Suggestion: Write the equation in the form x r Ax = 1 and 
make an orthogonal change of variable to eliminate the 
cross product terms.] 

(b) What property must a symmetric 3x3 matrix have in or- 
der for the equation x T Ax = 1 to represent an ellipsoid? 



Figure Ex-34 


35. What property must a symmetric 2x2 matrix A have for 
x t Ax = 1 to represent a circle? 


Working with Proofs 


36. Prove: lib ^ 0, then the cross product term can be eliminated 
from the quadratic form ax 2 + 2 bxy + cy 2 by rotating the co- 
ordinate axes through an angle 9 that satisfies the equation 


cot 29 = 


a — c 
2b 


37. Prove: If A is an n x n symmetric matrix all of whose eigen- 
values are nonnegative, then x T Ax > 0 for all nonzero x in 
the vector space R" . 


True-False Exercises 

TF. In parts (a)-(l) determine whether the statement is true or 
false, and justify your answer. 

(a) If all eigenvalues of a symmetric matrix A are positive, then A 
is positive definite. 

(b) x\ — x\ + x\ + 4x\XiXs is a quadratic form. 

(c) (xi — 3 ,v 2 ) 2 is a quadratic form. 


(d) A positive definite matrix is invertible. 

(e) A symmetric matrix is either positive definite, negative definite, 
or indefinite. 

(f ) If A is positive definite, then — A is negative definite. 

(g) x • x is a quadratic form for all x in R" . 

(h) If A is symmetric and invertible, and if x T Ax is a positive def- 
inite quadratic form, then x r A _1 x is also a positive definite 
quadratic form. 

(i) If A is symmetric and has only positive eigenvalues, then x r Ax 
is a positive definite quadratic form. 

(j) If A is a 2 x 2 symmetric matrix with positive entries and 
det(A) > 0, then A is positive definite. 

(k) If A is symmetric, and if the quadratic form x T Ax has no cross 
product terms, then A must be a diagonal matrix. 

(l) If x r Ax is a positive definite quadratic form in two variables 
and c / 0, then the graph of the equation x T Ax — c is an el- 
lipse. 


Working with Technology 

Tl. Find an orthogonal matrix P such that P T AP is diagonal. 

--2 i i r 

1-211 

A = 

11-21 
.1 1 1 - 2 . 


T2. Use the eigenvalues of the following matrix to determine 
whether it is positive definite, negative definite, or idefinite, and 
then confirm your conclusion using Theorem 7.3.4. 


A = 


-5 

-3 

0 

3 

0 


-3 

-2 

0 

2 

0 


0 

0 

-1 

1 

1 


3 

2 

1 

-8 

2 


O' 

0 

1 

2 

-7 


7.4 Optimization Using Quadratic Forms 

Quadratic forms arise in various problems in which the maximum or minimum value of 
some quantity is required. In this section we will discuss some problems of this type. 


Constrained Extremum Our first goal in this section is to consider the problem of finding the maximum and 

Problems minimum values of a quadratic form x r Ax subject to the constraint ||x|| = 1. Problems 
of this type arise in a wide variety of applications. 

To visualize this problem geometrically in the case where x r Ax is a quadratic form 
on R 2 , view z — x T Ax as the equation of some surface in a rectangular xyz-coordinate 
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Constrained t Z Constrained 



Unit circle 


▲ Figure 7.4.1 


system and view ||x|| = 1 as the unit circle centered at the origin of the xy-plane. Geo- 
metrically, the problem of finding the maximum and minimum values of x r Ax subject 
to the requirement ||x|| = 1 amounts to finding the highest and lowest points on the 
intersection of the surface with the right circular cylinder determined by the circle (Fig- 
ure 7.4.1). 

The following theorem, whose proof is deferred to the end of the section, is the key 
result for solving problems of this type. 


Constrained Extremum Theorem 

Let A be a symmetric n x n matrix whose eigenvalues in order of decreasing size are 

A.i > 7.2 > • ■ ■ > A.„. Then: 

(a) The quadratic form x T Ax attains a maximum value and a minimum value on the 
set of vectors for which ||x|| = 1. 

(b) The maximum value attained in part (a) occurs at a vector corresponding to the 
eigenvalue A-i . 

(c) The minimum value attained in part (a) occurs at a vector corresponding to the 
eigenvalue k n . 


The condition ||x|| = 1 in this theorem is called a constraint , and the maximum or 
minimum value of x T Ax subject to the constraint is called a constrained extremum. This constraint 
can also be expressed as x T x — 1 or as x( + x\ + ■ ■ ■ + x 2 = 1, when convenient. 


► EXAMPLE 1 Finding Constrained Extrema 

Find the maximum and minimum values of the quadratic form 

Z = 5x 2 + 5 y 2 + 4 xy 
subject to the constraint x 2 + y 2 — 1 . 

Solution The quadratic form can be expressed in matrix notation as 

^ = 5x 2 + 5 y 2 + 4xy = x r Ax = [x y] 

We leave it for you to show that the eigenvalues of A are A.i —1 and X 2 = 3 and that 
corresponding eigenvectors are 




T 


'-r 

A.! =7: 

1 

II 

i 


Normalizing these eigenvectors yields 


M = 7: 

1 

x /2 

1 

rn 

II 

1 

II CN 

- 

1 

1 


\/2 


s/2 


Thus, the constrained extrema are 

constrained maximum: z = 7 at (x, y) = (^, 4^) 
constrained minimum: z = 3 at (x, y) = (— 
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Remark Since the negatives of the eigenvectors in ( 1 ) are also unit eigenvectors, they too produce 
the maximum and minimum values of z; that is, the constrained maximum z = 7 also occurs at 
the point (x , y ) = (— , — A) and the constrained minimum z = 3 at (x , y ) = ( , — ^ ) . 



A Figure 7.4.2 A rectangle 
inscribed in the ellipse 
4x 2 + 9y 2 = 36. 


► EXAMPLE 2 A Constrained Extremum Problem 

A rectangle is to be inscribed in the ellipse 4x 2 + 9 y 2 = 36, as shown in Figure 7.4.2. 
Use eigenvalue methods to find nonnegative values of x and y that produce the inscribed 
rectangle with maximum area. 

Solution The area z of the inscribed rectangle is given by z — 4 xy, so the problem is to 
maximize the quadratic form z = 4 xy subject to the constraint 4x 2 + 9y 2 = 36. In this 
problem, the graph of the constraint equation is an ellipse rather than the unit circle as 
required in Theorem 7.4.1, but we can remedy this problem by rewriting the constraint 

and defining new variables, x\ and y\ , by the equations 

x = 3xi and y = 2yi 

This enables us to reformulate the problem as follows: 

maximize z = 4 xy = 24xiyi 


subject to the constraint 


■ y\ = 1 


To solve this problem, we will write the quadratic form z = 24xi yi as 


Z = x'Ax = [X| >’ 1 ] 


We now leave it for you to show that the largest eigenvalue of A is k = 12 and that the 
only corresponding unit eigenvector with nonnegative entries is 


' 0 

12 ' 

’xf 

_!2 

0 _ 

_yi_ 


x\ 

LTi. 


V2 

J_ 

V2 


Thus, the maximum area is z = 12, and this occurs when 

3 2 

x = 3xi = — — and y = 2yi = — — 

V2 V2 


Constrained Extrema and 
Level Curves 



A useful way of visualizing the behavior of a function /(x, y) of two variables is to 
consider the curves in the xy-plane along which /(x, y) is constant. These curves have 
equations of the form 

/(x,y) = k 

and are called the level curves of / (Figure 7.4.3). In particular, the level curves of a 
quadratic form x r Ax on R 2 have equations of the form 

x r Ax = k (2) 

so the maximum and minimum values of x T Ax subject to the constraint ||x|| = 1 are 
the largest and smallest values of k for which the graph of (2) intersects the unit cir- 
cle. Typically, such values of k produce level curves that just touch the unit circle 
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▲ Figure 7.4.4 


CALCULUS REQUIRED 

Relative Extrema of 
Functions of Two Variables 


(Figure 7.4.4), and the coordinates of the points where the level curves just touch produce 
the vectors that maximize or minimize x r Ax subject to the constraint ||x|| = 1. 


► EXAMPLE 3 Example 1 Revisited Using Level Curves 

In Example 1 (and its following remark) we found the maximum and minimum values 
of the quadratic form 

z — 5x 2 + 5 y 2 + 4 xy 

subject to the constraint x 2 + y 2 = 1. We showed that the constrained maximum is 
Z — l, which is attained at the points 

" d (x ’ y) = (-Tr~7i) ,3) 

and that the constrained minimum is z = 3, which is attained at the points 

(x ' y} = (-Tr7i) £ '“ d = ,41 

Geometrically, this means that the level curve 5x 2 + 5y 2 + 4 xy = 7 should just touch 
the unit circle at the points in (3), and the level curve 5x 2 + 5y 2 + 4xy = 3 should just 
touch it at the points in (4). All of this is consistent with Figure 7.4.5. 


► Figure 7.4.5 



We will conclude this section by showing how quadratic forms can be used to study 
characteristics of real-valued functions of two variables. 

Recall that if a function f(x, y ) has first-order partial derivatives, then its relative 
maxima and minima, if any, occur at points where the conditions 

fx(x,y) = 0 and f y (x, y) = 0 

are both true. These are called critical points of /. The specific behavior of / at a critical 
point (xo, yo) is determined by the sign of 

D(x,y) = f(x,y) - f(x 0 ,y 0 ) (5) 

at points ( x , y) that are close to, but different from, (xo, yo): 

If D(x, y) > Oat points (x, y) that are sufficiently close to, but different from, (xo, yo), 
then f(x o, yo) < /(x, y) at such points and / is said to have a relative minimum at 
(x 0 , yo) (Figure 7.4.6«). 
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Relative minimum at (0, 0) 
(a) 



Relative maximum at (0, 0) 


( b ) 


z 



Saddle point at (0, 0) 


(c) 

▲ Figure 7.4.6 


If D(x,y) < Oat points (x, y) that are sufficiently close to, but different from, (x 0 , y 0 ), 
then f(x o, yo) > fix, y ) at such points and / is said to have a relative maximum at 
(xq, vo) (Figure 7.4.66). 

If D(x, y) has both positive and negative values inside every circle centered at 
(xo, yo), then there are points (x, y) that are arbitrarily close to (xo, yo) at which 
fix o, yo) < fix, y) and points (x, y) that are arbitrarily close to (xo, yo) at which 
f(x o, yo) > fix, y). In this case we say that / has a saddle point at (xo, yo) (Figure 
7.4.6c). 

In general, it can be difficult to determine the sign of (5) directly. However, the 
following theorem, which is proved in calculus, makes it possible to analyze critical 
points using derivatives. 


Second Derivative Test 

Suppose that (xo, yo) is a critical point of /(x, y) and that f has continuous second- 
order partial derivatives in some circular region centered at (xo, yo)- Then'. 

(a) f has a relative minimum at (xo, yo) if 

fxxix 0 , yo)fyyix 0 , yo) - f xy ixo, yo) > 0 and fxxix o, yo) > 0 

(b) f has a relative maximum at (xo, yo) if 

fxxix 0 , yo)fyyixo, yo) - f xy ix o, yo) > o and f xx (x 0 , y 0 ) < 0 

(c) / has a saddle point at (xo, yo) if 

fxxix 0 , yo)fyyix 0 , yo) “ f xy i X 0, yo) < 0 

( d ) The test is inconclusive if 

fxxix 0 , yo)fyyix 0 , yo) “ f xy i X 0, yo) = 0 


Our interest here is in showing how to reformulate this theorem using properties of 
symmetric matrices. For this purpose we consider the symmetric matrix 


rj, , fxxix, y) fxyiX, y) 

Hix,y) = 

L fxyix, y) fyyix,y ) J 

which is called the Hessian or Hessian matrix of / in honor of the German mathematician 
and scientist Ludwig Otto Hesse (1811-1874). The notation H(x,y ) emphasizes that 
the entries in the matrix depend on x and y. The Hessian is of interest because 


det[H(x 0 , y 0 )] 


fxxix o, y 0 ) 
fxyix o, y 0 ) 


fxyix o, yo) 
fyyiXQ, yo) 


= fxxix 0, yo)fyyix 0 , yo) “ fxyi x Q, yo) 


is the expression that appears in Theorem 7.4.2. We can now reformulate the second 
derivative test as follows. 
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Hessian Form of the Second Derivative Test 

Suppose that (xo, yo) is a critical point of fix, y) and that f has continuous second- 
order partial derivatives in some circular region centered at (xo, yo)- If H{x o> To) is the 
Hessian of f at (xo, yo), then: 

(a) f has a relative minimum at (xo, yo) f 77 (xo, Jo) is positive definite. 

(b) f has a relative maximum at (xo, Jo) if 77 (x o, Jo) is negative definite. 

(c) / has a saddle point at (xo, Jo) if H(x o, Jo) is indefinite. 

(d) The test is inconclusive otherwise. 


We will prove part (a). The proofs of the remaining parts will be left as exercises. 


Proof [a) If 77 (x o, jo) is positive definite, then Theorem 7.3.4 implies that the principal 
submatrices of H(x 0 , j 0 ) have positive determinants. Thus, 


det[77(x 0 , jo)] 


fxxix o, Jo) 
fxy(x o, Jo) 


fxy(x 0 , Jo) 
fyy (*0> Jo) 


= fxxix o, Jo) fyy iXt) > Jo) “ /*,(* 0 , Jo) > 0 


and 


det[/„(xo, jo)] = f xx {x o, jo) > 0 
so / has a relative minimum at (xq, jo) by part (a) of Theorem 7.4.2. 


► EXAMPLE 4 Using the Hessian to Classify Relative Extrema 

Find the critical points of the function 

f{x, j) = }x 3 + xj 2 - 8xj + 3 

and use the eigenvalues of the Hessian matrix at those points to determine which of 
them, if any, are relative maxima, relative minima, or saddle points. 


Solution To find both the critical points and the Hessian matrix we will need to calculate 
the first and second partial derivatives of /. These derivatives are 

fx(x, j) = x 2 + j 2 - 8y, f y (x, y) = 2xy - 8x, f xy (x, y) = 2y - 8 
fxxix, y) = 2x, f yy (x , j) = 2x 

Thus, the Hessian matrix is 


~f xx (x, y) 

f X yix, j)' 


2x 2j — 8 

Jxyix, j) 

fyyix, j). 


2j — 8 2x 


To find the critical points we set f x and f y equal to zero. This yields the equations 
f x (x , j) = x 2 + j 2 — 8j = 0 and f y (x , j) = 2xy — 8x = 2x(j — 4) = 0 


Solving the second equation yields x = 0 or j = 4. Substituting x = 0 in the first equa- 
tion and solving for y yields j = Oor j = 8; and substituting j = 4 into the first equation 
and solving for x yields x = 4 or x = —4. Thus, we have four critical points: 


(0,0), (0,8), (4,4), (-4,4) 


Evaluating the Hessian matrix at these points yields 


Hi 0,0) 
77(4,4) 



77(0,8) = 


8 ' 

0 


'8 O' 
0 8 


77 (—4, 4) = 


O' 

-8 
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OPTIONAL 


We leave it for you to find the eigenvalues of these matrices and deduce the following 
classifications of the stationary points: 


Critical Point 

Uo, yo) 

Ai 

^ 2 

Classification 

(0, 0) 

8 

-8 

Saddle point 

(0, 8) 

8 

-8 

Saddle point 

(4, 4) 

8 

8 

Relative minimum 

(-4, 4) 

-8 

-8 

Relative maximum 


We conclude this section with an optional proof of Theorem 7.4.1. 


Proof of Theorem 7.4. 1 The first step in the proof is to show that x r Ax has constrained 
maximum and minimum values for ||x|| = 1. Since A is symmetric, the principal axes 
theorem (Theorem 7.3.1) implies that there is an orthogonal change of variable x = Py 
such that 

x t Ax = Xiy* + x z y\ 4 f Kyi (6) 

in which A.j, X 2 , . . . , X„ are the eigenvalues of A. Let us assume that ||x|| = 1 and that 
the column vectors of P (which are unit eigenvectors of A) have been ordered so that 

A-i > A.2 > • • • > X n (7) 

Since the matrix P is orthogonal, multiplication by P is length preserving, from which 
it follows that ||y|| = ||x|| = 1; that is, 

yi + yi h 1 - yl = 1 


It follows from this equation and (7) that 

K = K(y\ + V; 1 + yl) < Aj Vf + X 2 y\ + • • ■ + X n y~ 

< AtCy 2 + y\ + • • • + yl) = Ai 


and hence from (6) that 

X n < x t Ax < \\ 

This shows that all values of x T Ax for which ||x|| = 1 lie between the largest and smallest 
eigenvalues of A. Now let x be a unit eigenvector corresponding to A i . Then 

x t Ax — x r (A.ix) = A.ix r x = Li||x|| 2 = A-i 

which shows that x r Ax has Li as a constrained maximum and that this maximum occurs 
if x is a unit eigenvector of A corresponding to a | . Similarly, if x is a unit eigenvector 
corresponding to X „ , then 


x t Ax — x T (X n x) = X n x T x — A.„||x|| 2 = X„ 

so x t Ax has X n as a constrained minimum and this minimum occurs if x is a unit eigen- 
vector of A corresponding to X„ . This completes the proof. 
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Exercise Set 7.4 

In Exercises 1-4, find the maximum and minimum values of 
the given quadratic form subject to the constraint x 2 + y 2 = 1, 
and determine the values of x and y at which the maximum and 
minimum occur. 

1. 5x 2 — y 2 2. xy 3. 3x 2 + ly 2 4. 5x 2 + 5xy 

In Exercises -6, find the maximum and minimum values of 
the given quadratic form subject to the constraint 

x 1 + y 2 + z 2 = 1 

and determine the values of x, y, and ^ at which the maximum 
and minimum occur. 

5. 9x 2 + 4y 2 + 3z 2 6. lx 2 + y 2 + z 2 + 2xy + 2xz 

7. Use the method of Example 2 to find the maximum and min- 
imum values of xy subject to the constraint 4x 2 + 8y 2 = 16. 

8. Use the method of Example 2 to find the maximum and 
minimum values of x 2 + xy + 2y 2 subject to the constraint 
x 2 + 3y 2 = 16. 

In Exercises -10, draw the unit circle and the level curves cor- 
responding to the given quadratic form. Show that the unit circle 
intersects each of these curves in exactly two places, label the in- 
tersection points, and verify that the constrained extrema occur at 
those points. 

9. 5x 2 — y 2 10. xy 

11. (a) Show that the function fix, y) = 4xy — x 4 — y 4 has crit- 

ical points at (0, 0), (1, 1), and (—1,-1). 

(b) Use the Hessian form of the second derivative test to show 
that / has relative maxima at (1, 1) and (—1, —1) and a 
saddle point at (0, 0). 

12. (a) Show that the function f(x, y) = x 3 — 6xy — v 3 has crit- 

ical points at (0, 0) and (—2, 2). 

(b) Use the Hessian form of the second derivative test to show 
that / has a relative maximum at (—2, 2) and a saddle 
point at (0, 0). 

In Exercises 11 5-16, find the critical points of /, if any, and clas- 
sify them as relative maxima, relative minima, or saddle points. 

13. f{x, y) = x 3 - 3xy - y 3 

14. fix, y) = x 3 — 3xy + y 3 

15. fix, y) = x 2 + 2y 2 - x 2 y 

16. fix, y) = x 3 + v 3 - 3x - 3y 

17. A rectangle whose center is at the origin and whose sides are 
parallel to the coordinate axes is to be inscribed in the ellipse 
x 2 + 25y 2 = 25. Use the method of Example 2 to find non- 
negative values of x and y that produce the inscribed rectangle 
with maximum area. 


18. Suppose that x is a unit eigenvector of a matrix A correspond- 
ing to an eigenvalue 2. What is the value of x^x? 

19. (a) Show that the functions 

fix, y) = x 4 + y 4 and g(x, y) = x 4 - y 4 

have a critical point at (0, 0) but the second derivative test 
is inconclusive at that point. 

(b) Give a reasonable argument to show that / has a relative 
minimum at (0, 0) and g has a saddle point at (0, 0). 

20. Suppose that the Hessian matrix of a certain quadratic form 
fix, y) is 


What can you say about the location and classification of the 
critical points of /? 

21. Suppose that A is an n x n symmetric matrix and 
qix) = x r Ax 

where x is a vector in R" that is expressed in column form. 
What can you say about the value of q if x is a unit eigenvec- 
tor corresponding to an eigenvalue X of A? 


Working with Proofs 

22. Prove: If x T Ax is a quadratic form whose minimum and maxi- 
mum values subject to the constraint ||x|| = 1 arem andM, re- 
spectively, then for each number c in the interval m < c < M, 
there is a unit vector x c such that xjAx c = c. [Hint: In the 
case where m < M, let u,„ and u m be unit eigenvectors of A 
such that u^Au,,, = m and u t m Aum = M, and let 

I M — c I c — m 

= \ 77 u '" + \ 77 u « 

V M — m V M — m 

Show that x t c Ax c = c.] 

True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 

false, and justify your answer. 

(a) A quadratic form must have either a maximum or minimum 
value. 

(b) The maximum value of a quadratic form x T Ax subject to the 
constraint ||x|| = 1 occurs at a unit eigenvector corresponding 
to the largest eigenvalue of A. 

(c) The Hessian matrix of a function / with continuous second- 
order partial derivatives is a symmetric matrix. 

(d) If (xo, Vo) is a critical point of a function / and the Hessian of 
/ at (xo, yo) is 0, then / has neither a relative maximum nor 
a relative minimum at (x 0 , Vo). 

(e) If A is a symmetric matrix and det(A) < 0, then the minimum 
of x T Ax subject to the constraint ||x|| = 1 is negative. 
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Working withTechnology 

Tl. Find the maximum and minimum values of the following 
quadratic form subject to the stated constraint, and specify the 
points at which those values are attained. 

w = 2x 2 + y 2 + z 2 + 2xy + 2 xz\ x 2 + y 2 + z 2 = 1 

T2. Suppose that the temperature at a point ( x , y) on a metal 
plate is T (x, y) = 4.v 2 — 4xy + y 2 . An ant walking on the plate 
traverses a circle of radius 5 centered at the origin. What are the 
highest and lowest temperatures encountered by the ant? 

T3. The accompanying figure shows the intersection of the sur- 
face z = x 2 + 4y 2 (called an elliptic paraboloid) and the surface 
x 2 + y 2 = 1 (called a right circular cylinder). Find the highest and 
lowest points on the curve of intersection. 



◄ Figure Ex-T3 


7.5 Hermitian, Unitary, and Normal Matrices 

We showed in Section 7.2 that every symmetric matrix with real entries is orthogonally 
diagonalizable, and conversely that every diagonalizable matrix with real entries is 
symmetric. In this section we will be concerned with the diagonalization problem for 
matrices with complex entries. 


Real Matrices Versus As discussed in Section 5.3, we distinguish between matrices whose entries must be real 
Complex Matrices numbers, called real matrices, and matrices whose entries may be either real numbers 
or complex numbers, called complex matrices. When convenient, you can think of a 
real matrix as a complex matrix each of whose entries has zero as its imaginary part. 
Similarly, we distinguish between real vectors (those in R 1 ' ) and complex vectors (those 
in C"). 


Hermitian and Unitary 
Matrices 


The transpose operation is less important for complex matrices than for real matrices. 
A more useful operation for complex matrices is given in the following definition. 


DEFINITION 1 If A is a complex matrix, then the conjugate transpose of A, denoted 
by A*, is defined by 

A* = A T (1) 


Remark Note that the order in which the transpose and conj ugation operations are performed in 
Formula ( 1) does not matter (see Theorem 5.3.2Z?). Moreover, if A is a real matrix, then Formula (1) 
simplifies to A* = (A) r = A r , so the conjugate transpose is the same as the transpose in that case. 


► EXAMPLE 1 Conjugate Transpose 

Find the conjugate transpose A* of the matrix 

1 + i —i 0 
i 


A = 


2 3-2 i 
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Solution We have 


A = 





1 — i 

2 “ 

'1 - i 

i O' 




and hence A* = A T = 

i 

3 + 2/ 

2 

3 + 2 i —i 


0 




—i 


◄ 


The following theorem, parts of which are given as exercises, shows that the basic 
algebraic properties of the conjugate transpose operation are similar to those of the 
transpose (compare to Theorem 1.4.8). 


!EM 7.5.1 Ifk is a complex scalar , and if A and B are complex matrices whose 
sizes are such that the stated operations can be performed , then : 

(a) (A*)* = A 

(b) (A + B)* = A* + B* 

(c) (A - B)* = A* - B* 

(d) ( kA )* = kA* 

(e) (AB)* = B*A* 


We now define two new classes of matrices that will be important in our study of 
diagonalization in C" . 



DEFINITION 2 A square matrix A is said to be unitary if 



AA* = A*A = / 

(2) 

To show that a matrix is uni- 

or, equivalently, if 


tary it suffices to show that 

A* = A” 1 

(3) 

either AA* = I or A*A = 1 


since either equation implies 

and it is said to be Hermitian if 


the other. 

A* = A 

(4) 


If A is a real matrix, then A* = A r , in which case (3) becomes A T = A -1 and (4) 
becomes A T = A. Thus, the unitary matrices are complex generalizations of the real 
orthogonal matrices and the Hermitian matrices are complex generalizations of the real 
symmetric matrices. 


► EXAMPLE 2 Recognizing Hermitian Matrices 

Hermitian matrices are easy to recognize because their diagonal entries are real (why?) 
and the entries that are symmetrically positioned across the main diagonal are complex 
conjugates. Thus, for example, we can tell by inspection that 


A = 


1 

—i 
1 — i 


i 1 + 1 
-5 2 — i 

2 + i 3 


is Hermitian. 


In honor of the French mathematician Charles Hermite (1822-1901). 
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► EXAMPLE 3 Recognizing Unitary Matrices 


Unlike Hermitian matrices, unitary matrices are not readily identifiable by inspection. 
The most direct way to identify such matrices is to determine whether the matrix satisfies 
Equation (2) or Equation (3). We leave it for you to verify that the following matrix is 
unitary: 


A = 


V2 




V2 _ 


◄ 


In Theorem 7.2.2 we established that real symmetric matrices have real eigenvalues 
and that eigenvectors from different eigenvalues are orthogonal. That theorem is a 
special case of our next theorem in which orthogonality is with respect to the complex 
Euclidean inner product on C n . We will prove part ( b ) of the theorem and leave the 
proof of part (a) for the exercises. In our proof we will make use of the fact that the 
relationship u • v = v r u given in Formula (5) of Section 5.3 can be expressed in terms of 
the conjugate transpose as 

u • v = v*u (5) 


!EM 7.5.2 If A is a Hermitian matrix, their. 

(a) The eigenvalues of A are all real numbers. 

(b) Eigenvalues from different eigenspaces are orthogonal. 


Proof (b) Let vi and v 2 be eigenvectors of A corresponding to distinct eigenvalues 7. [ 
and X 2 . Using Formula (5) and the facts that Xi — X\,X 2 = X 2 , and A = A*, we can 
write 

ki(v 2 • vi) = (A.ivO*v 2 = (Avi)*v 2 = (v* A*) v 2 
= (v*A)v 2 = v*(Av 2 ) 

= v?(l 2 v 2 ) = k 2 (vfv 2 ) = k 2 (v 2 • Vi) 

This implies that (ki — k 2 )(v 2 • vi) = 0 and hence that v 2 • vi = 0 (since ki X 2 ). 


► EXAMPLE 4 Eigenvalues and Eigenvectors of a Hermitian Matrix 

Confirm that the Hermitian matrix 


A = 



1 +2 
3 


has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. 


Solution The characteristic polynomial of A is 


det(k/ — A) = 


X — 2 
— 1 “hi 


— 1 — i 
X-3 


= (X — 2)(X — 3) — (—1 — i)(— 1 + 0 
= (X 2 -5X + 6)-2 = (X - 1)(A - 4) 

so the eigenvalues of A are X = 1 and X = 4, which are real. Bases for the eigenspaces 
of A can be obtained by solving the linear system 


' X-2 

— 1 — i ~ 

'x{ 


'O' 

_ — 1 + i 

A — 3_ 

_x 2 _ 


_ 0 _ 
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with 1=1 and with 1 = 4. We leave it for you to do this and to show that the general 
solutions of these systems are 


~x{ 

f 

— 1 — i 

- X 2_ 


1 


Thus, bases for these eigenspaces are 


and 


V 

f 

1(1 + 0' 

X2_ 


1 


1 = 1: vi 



and 1 = 4: v 2 


i(l + 0 

1 


The vectors vi and v 2 are orthogonal since 

Vl . v 2 = (-i - i ) (ia+ 7 j ) + (i)(D = i(-i - od -o + i = o 

and hence all scalar multiples of them are also orthogonal. 


Unitary matrices are not usually easy to recognize by inspection. However, the 
following analog of Theorems 7.1.1 and 7.1.3, part of which is proved in the exercises, 
provides a way of ascertaining whether a matrix is unitary without computing its inverse. 


If A is an n x n matrix with complex entries, then the following are 

equivalent. 

(a) A is unitary. 

(b) ||Ax|| = ||x|| for all x in C". 

(c) Ax • Ay = x • y for all x and y in C' 1 . 

( d ) The column vectors of A form an orthonormal set in C n with respect to the complex 
Euclidean inner product. 

(e) The row vectors of A form an orthonormal set in C " with respect to the complex 
Euclidean inner product. 


► EXAMPLE 5 A Unitary Matrix 

Use Theorem 7.5.3 to show that 


1(1 + 0 y(l + 0 

1(1-0 i(-i + 0 


is unitary, and then find A 1 . 

Solution We will show that the row vectors 

r i = [1(1 + 0 1(1 + 0] and r 2 = [1(1-0 !(-l+0] 

are orthonormal. The relevant computations are 

ii r i ii = yii(i + oi _ + ii(i + oi = yi+i = i 

Hull = ^11(1 — 0|" + H(“l + 0| = yj i + i — i 
r i * r 2 = (l(i + o) (l(i — o ) + (l(i + o) (1(— i + o ) 

= (1(1 + 0) (1(1 + 0) + (1(1 + 0) (!(— 1 — 0) = 1 i — \i = o 
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Unitary Diagonalizability 


Since we now know that A is unitary, it follows that 


A -1 = A* 


2 ( 1-0 2 0 + 0 
id-0 i(-i-0 


You can confirm the validity of this result by showing that A A* = A* A = I. 


Since unitary matrices are the complex analogs of the real orthogonal matrices, the 
following definition is a natural generalization of orthogonal diagonalizability for real 
matrices. 


DEFINITION 3 A square complex matrix A is said to be unitarily diagonalizable if 
there is a unitary matrix P such that P*AP = D is a complex diagonal matrix. Any 
such matrix P is said to unitarily diagonalize A. 


Recall that a real symmetric n x n matrix A has an orthonormal set of n eigenvectors 
and is orthogonally diagonalized by any n x n matrix whose column vectors are an 
orthonormal set of eigenvectors of A. Here is the complex analog of that result. 


!EM 7.5.4 Every n x n Hermitian matrix A has an orthonormal set of n eigen- 
vectors and is unitarily diagonalized by any n x n matrix P whose column vectors form 
an orthonormal set of eigenvectors of A. 


The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same 
as that for orthogonally diagonalizing a symmetric matrix: 


Unitarily Diagonalizing a Hermitian Matrix 

Step 1. Find a basis for each eigenspace of A. 

Step 2. Apply the Gram-Schmidt process to each of these bases to obtain orthonor- 
mal bases for the eigenspaces. 

Step 3. Form the matrix P whose column vectors are the basis vectors obtained in 
Step 2. This will be a unitary matrix (Theorem 7.5.3) and will unitarily diag- 
onalize A . 


► EXAMPLE 6 Unitary Diagonalization of a Hermitian Matrix 

Find a matrix P that unitarily diagonalizes the Hermitian matrix 



Solution We showed in Example 4 that the eigenvalues of A are 7. = I and X = 4 and 
that bases for the corresponding eigenspaces are 



1 — i 



_ 2(l + 0' 

A. = 1: vi = 

1 

and X = 

: v 2 = 

1 


Since each eigenspace has only one basis vector, the Gram-Schmidt process is simply a 
matter of normalizing these basis vectors. We leave it for you to show that 
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Skew-Symmetric and 
Skew-Hermitian Matrices 


Normal Matrices 


A Comparison of 
Eigenvalues 


Vi 

-i-/ 

V3 

and p 2 = 

V2 

1+/ 

a/6 

y 

ii 

^r 

II 

1 

V3 

IM 

2 

V6 

Thus, A is unitarily diagonalized by the matrix 

~-l -i 

1+/” 



V3 

1 

V3 


V6 

2 

d6 


P = [Pi P 2 ] = 

Although it is a little tedious, you may want to check this result by showing that 
P*AP = 


_ -i+j 

V3 

1 

V3 

2 

1 H“ 

"-1-I 

V3 

1+/ 

V6 


'1 

o' 

1-/ 

>/6 

1 

1 — i 

3 

1 

V3 

2 

V6_ 


_0 

4_ 


We will now consider two more classes of matrices that play a role in the analysis of 
the diagonalization problem. A square real matrix A is said to be skew-symmetric if 
A t = —A, and a square complex matrix A is said to be skew-Hermitian if A* = —A. 
We leave it as an exercise to show that a skew-symmetric matrix must have zeros on 
the main diagonal, and a skew-Hermitian matrix must have zeros or pure imaginary 
numbers on the main diagonal. Here are two examples: 


0 

1 

—2 


i 

1 — i 

5 

-1 

0 

4 

A = 

— 1 — i 

2 i 

i 

2 

-4 

0 


-5 

i 

0 


[ skew-symmetric 1 | skew-Hermitian 1 


Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. 
For example, we know that real symmetric matrices are orthogonally diagonalizable and 
Hermitian matrices are unitarily diagonalizable. However, whereas the real symmetric 
matrices are the only orthogonally diagonalizable matrices, the Hermitian matrices do 
not constitute the entire class of unitarily diagonalizable complex matrices. Specifically, 
it can be proved that a square complex matrix A is unitarily diagonalizable if and only if 

AA* = A* A (6) 

Matrices with this property are said to be normal. Normal matrices include the Her- 
mitian, skew-Hermitian, and unitary matrices in the complex case and the symmetric, 
skew-symmetric, and orthogonal matrices in the real case. The nonzero skew-symmetric 
matrices are particularly interesting because they are examples of real matrices that are 
not orthogonally diagonalizable but are unitarily diagonalizable. 


We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask 
you to show that the eigenvalues of a skew-Hermitian matrix are either zero or purely 
imaginary (have real part of zero) and that the eigenvalues of unitary matrices have 
modulus 1. These ideas are illustrated schematically in Figure 7.5.1. 


Pure imaginary 
eigenvalues 
(skew-Hermitian) 



p| = 1 (unitary) 

V 


Real eigenvalues 
(Hermitian) 


► Figure 7.5.1 
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Exercise Set 7.5 

In Exercises 1-2, find A*. 
2 i I - f 

1. A = 4 3 + ;' 

5 + i 0 


2. A = 


'2; 1 — i — 1 + i 

4 5-7 i -i 


In Exercises -4, substitute numbers for the x ’s so that A is 
Hermitian. 


3. A = 


In Exercises 5-6, show that A is not Hermitian for any choice 
of the x ’s. 


" 1 ;' 2 - 3 i 


~2 

0 

3 4-5;' 

x -3 1 

4. A = 

X 

-4 

— ; 

1 

(N 

X 

X 

1 


X 

X 

6 


5. (a) A = 


(b) A = 


1 

i 2 - : 

—i 

— 3 x 

2-3 ;' 

X X 

X 

x 3 + 5/ 

0 

i —i 

3-5;' 

i x 



1 

14-;' x 

(a) A = 

1 4- i 

7 x 


6-2 i 

x 0 


1 

x 3 4-5; 

(b) A = 

X 

3 1 -i 


3-5;' 

x 2 4-;' 


'3 2 - 3; 

2 4-3;' -1 


8. A = 


' 0 
—2 ; 


7. A = 

2 + il -1 J _—Zl 1_ 

In Exercises 12, show that A is unitary, and find A -1 . 


9. A = 


5 5 

4 3 



1 

1 

II 

© 

V2 

V2 


-5(1 + 0 

5(1 + 0 


11. A = 

m + 0 

il/i 


_+(! + ;'V3) 

571 (' " 

12. A = 

+ (-1 + 0 



L 73 

7/6 j 


13. A = 


' 4 
1 + i 


1 — ;' 
5 


14. A = 


3 -;' 

i 3 


15. A = 


17. A = 


18. A = 


6 2 4- 2; 

2-2 i 4 


16. A = 


0 3 4-;" 

3 -i -3 


5 0 0 

0 -1 - 14 -;' 

0 - 1 -;' 0 


2 75 * Vi' 


V2 ! 

L + 0 


In Exercises 19-20, substitute numbers for the x ’s so that A is 
skew-Hermitian. 


19. A = 


In Exercises 21-22, show that A is not skew-Hermitian for any 
choice of the x ’s. 


"o i 

2 - 3;" 


"o 

0 3 - 5 i 

x 0 

1 

ts> 

© 

II 

X 

0 — ; 

X X 

4 ;' 


X 

x 0 


In Exercises 7-8, verify that the eigenvalues of the Hermitian 
matrix A are real and that eigenvectors from different eigenspaces 
are orthogonal (see Theorem 7.5.2). 


21. (a) A = 


(b) A = 


22. (a) A = 


(b) A = 


0 i 2— 3; 

— ;' 0 x 

2 4-3; x x 


1 x 

x 2 ;' 

— 3 4- 5 i i 

i x 

x 0 

2 4-3; — 1 — ;' 


3-5; 
—i 
3 i 


2-3 i 
14-;' 
x 


0 — ;' 4 4- 7; 

x Ox 

-4 -7 ;' x 1 


In Exercises 23-24, verify that the eigenvalues of the skew- 
Hermitian matrix A are pure imaginary numbers. 


23. A = 


0 - 14 -;" 

14-;' ;' 


24. A = 


0 3;" 

3 i 0 


In Exercises 25-26, show that A is normal. 


In Exercises —18, find a unitary matrix P that diagonalizes 
the Hermitian matrix A, and determine P~ l AP. 


25. A = 


26. A = 


1+2/ 

2 + / 

-2-i 

2 + / 

1 + / 

—i 

-2-i 

— / 

1 + / 

2 + 2/ 

/ 

1 — / 

/ 

-2/ 

1-3/ 

1 — / 

1 - 3/ 

-3 + 8/ 


444 


Diagonalization and Quadratic Forms 


27. Let A be any n x n matrix with complex entries, and define 
the matrices B and C to be 

B = -(A + A*) and C=— (A - A*) 

2 2 i 

(a) Show that B and C are Hermitian. 

(b) Show that A = B + iC and A* = B — iC . 

(c) What condition must B and C satisfy for A to be normal? 


38. Prove that each entry on the main diagonal of a skew- 
Hermitian matrix is either zero or a pure imaginary number. 

39. Prove that if A is a unitary matrix, then so is A*. 

40. Prove that the eigenvalues of a skew-Hermitian matrix are 
either zero or pure imaginary. 

41. Prove that the eigenvalues of a unitary matrix have modulus 1 . 


28. Show that if A is an /z x n matrix with complex entries, and if 
u and v are vectors in C" that are expressed in column form, 
then 

Au • v = u • A*v and u • Av = A*u • v 


29. Show that 


A 


1 I" e w 

Vi [ie w 



42. Prove that if u is a nonzero vector in C" that is expressed in 
column form, then P = uu* is Hermitian. 

43. Prove that if u is a unit vector in C" that is expressed in column 
form, then H = / — 2u u* is Hermitian and unitary. 

44. Prove that if A is an invertible matrix, then A* is invertible, 
and (A*)-' = (A -1 )*. 


is unitary for all real values of 8. [Note: See Formula (17) in 
Appendix B for the definition of e ,e .] 

30. Show that 

_ a + iy —/3 + iS 
fi + iS a — iy 
is unitary if a 2 + P 2 + L 2 + <5 2 = 1. 


31. Let A be the unitary matrix in Exercise 9, and verify that the 
conclusions in parts ( b ) and (c) of Theorem 7.5.3 hold for the 
vectors x = (1 + i, 2 — i) and y = (1, 1 — i). 

32. Let Ta : C 2 — *■ C 2 be multiplication by the Hermitian matrix A 
in Exercise 14, and find two orthogonal unit vectors Ui and U 2 
for which 7 a(u!) and T A (u 2 ) are orthogonal. 


45. (a) Prove that det( A) = det(A). 

(b) Use the result in part (a) and the fact that a square matrix 
and its transpose have the same determinant to prove that 
det(A*) = det(A). 

46. Use part (b) of Exercise 45 to prove: 

(a) If A is Hermitian, then det(A) is real. 

(b) If A is unitary, then | det(A)| = 1. 

47. Prove that annxn matrix with complex entries is unitary if 
and only if the columns of A form an orthonormal set in C" . 

48. Prove that the eigenvalues of a Hermitian matrix are real. 


33. Under what conditions is the following matrix normal? 


a 


A = 


0 

0 


0 

0 

b 


0 

c 

0 


34. What relationship must exist between a matrix and its inverse 
if it is both Hermitian and unitary? 


35. Find a 2 x 2 matrix that is both Hermitian and unitary and 
whose entries are not all real numbers. 


Working with Proofs 

36. Use properties of the transpose and complex conjugate to 
prove parts (b) and ( d ) of Theorem 7.5.1. 


True-False Exercises 


TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

"o 


(a) The matrix 


is Hermitian. 



v/2 

n 

V3 


(b) The matrix 

0 

~T6 

i 

V3 

is unitary. 


i 

- V2 


i 

V3- 



(c) The conjugate transpose of a unitary matrix is unitary. 


(d) Every unitarily diagonalizable matrix is Hermitian. 


37. Use properties of the transpose and complex conjugate to 
prove parts (a) and (e) of Theorem 7.5.1. 


(e) A positive integer power of a skew-Hermitian matrix is skew- 
Hermitian. 
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Supplementary Exercises 


1 . Verify that each matrix is orthogonal, and find its inverse. 10 . Find a unitary matrix U that diagonalizes 


'3 

4" 


r 4 

5 

0 

3 1 

5 


"l 

1 

0" 

5 

4 

5 

3 

(b) 

9 

25 

4 

5 

12 

25 

A = 

0 

1 

1 

_ 5 

5 _ 


12 

3 

16 


1 

0 

1 




L 25 

5 

25 -1 






2 . Prove: If Q is an orthogonal matrix, then each entry of Q is and determine the diagonal matrix D = U~ l AU. 
the same as its cofactor if det(2) = 1 and is the negative of 

its cofactor if det(< 2 ) = — 1. 11 . Show that if U is an n x n unitary matrix and 


3 . Prove that if A is a positive definite symmetric matrix, and if 
u and v are vectors in R" in column form, then 
(u. v) = u r Av 
is an inner product on R n . 


4 . Find the characteristic polynomial and the dimensions of the 
eigenspaces of the symmetric matrix 

"3 2 2 

2 3 2 

2 2 3 


5 . Find a matrix P that orthogonally diagonalizes 


A = 


1 

0 

1 


0 1 
1 0 
0 1 


kll = |Z2l = ■ ••= |Znl = 1 

then the product 

'zi 0 0 ••• O' 

0 z 2 0 • • • 0 

_0 0 0 ■■■ z n _ 

is also unitary. 

12 . Suppose that A is skew-Hermitian. 

(a) Show that iA is Hermitian. 

(b) Show that A is unitarily diagonalizable and has pure imag- 
inary eigenvalues. 


and determine the diagonal matrix D = P T AP. 


13 . Find a , b , and c for which the matrix 


6. Express each quadratic form in the matrix notation x T Ax. 

(a) — 4 x 2 + 16 xf — 1 5 jciJC2 

(b) 9 x 2 — xf + 4x\ + 6x1X2 — 8x1X3 + X2-X3 

7 . Classify the quadratic form 

x\ — 3.X1X2 + 4 xj 

as positive definite, negative definite, indefinite, positive semi- 
definite, or negative semidefinite. 

8. Find an orthogonal change of variable that eliminates the 
cross product terms in each quadratic form, and express the 
quadratic form in terms of the new variables. 

(a) — 3 xj + 5 xf + 2 xix 2 

(b) — 5 xj + xf — xf + 6x1X3 + 4 xiX2 

9 . Identify the type of conic section represented by each equa- 
tion. 

(a) y - x 2 = 0 (b) 3 x - 1 ly 2 = 0 


a 

V2 

v/2 

b 

1 

V6 

1 

V6 

c 

1 

V 3 

1 


is orthogonal. Are the values of a , b, and c unique? Explain. 

14 . In each part, suppose that A is a 4 x 4 matrix in which det(My) 
is the determinant of the jth principal submatrix of A. De- 
termine whether A is positive definite, negative definite, or 
indefinite. 

(a) det(Mi) < 0, det (M2) > 0, det(M 3 ) < 0, det(M 4 ) > 0 

(b) det(Mi) > 0, det(M 2 ) > 0, det(M 3 ) > 0, det(M 4 ) > 0 

(c) det(Mi) < 0, det(M 2 ) < 0, det(M 3 ) < 0, det(M 4 ) < 0 

(d) det(Mi) > 0, det(M 2 ) < 0, det(M 3 ) > 0, det(M 4 ) < 0 

(e) detfMp = 0 , det (M 2 ) < 0 , det(M 3 ) = 0 . det(M 4 ) > 0 

(f) det(M,) = 0 , det(M 2 ) > 0 , det(M 3 ) = 0 , det (Af 4 ) = 0 
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INTRODUCTION In earlier sections we studied linear transformations from R n to R'". In this chapter we 
will define and study linear transformations from a general vector space V to a general 
vector space W. The results we will obtain here have important applications in physics, 
engineering, and various branches of mathematics. 


8.1 General Linear Transformations 

Up to now our study of linear transformations has focused on transformations from R n to 
R"’ . In this section we will turn our attention to linear transformations involving general 
vector spaces. We will illustrate ways in which such transformations arise, and we will 
establish a fundamental relationship between general n -dimensional vector spaces and R" . 


Definitions and 
Terminology 


In Section 1.8 we defined a matrix transformation T A : R” — »• R m to be a mapping of the 
form 

(x ) = Ax 

in which A is an m x n matrix. We subsequently established in Theorem 1.8.3 that the 
matrix transformations are precisely the linear transformations from R" to R m , that is, 
the transformations with the linearity properties 


T(u + v) = T{ u) + T(\) and T(ku) = kT(u ) 


We will use these two properties as the starting point for defining more general linear 
transformations. 


DEFINITION 1 If T : V -» W is a mapping from a vector space V to a vector space W, 
then T is called a linear transformation from V to W if the following two properties 
hold for all vectors u and v in V and for all scalars k: 

(i) T(ku) — k / ( 11 1 | Homogeneity property ) 

(ii) T(u + v) — 7 ( II ) -* / ( \ ) [ Additivity property ] 

In the special case where V = W. the linear transformation T is called a linear operator 
on the vector space V. 
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The homogeneity and additivity properties of a linear transformation T :V W 
can be used in combination to show that if vi and V 2 are vectors in V and k\ and k 2 are 
any scalars, then 

T(k\\ 1 + k 2 y 2 ) = k\T(y\) + k 2 T(y 2 ) 

More generally, if vi, V 2 , . . . , v r are vectors in V and k\,k 2 , ... , k r are any scalars, then 
T(ki\ 1 + k 2 \ 2 H b k r y r ) = k\T{yx) + k 2 T(y 2 ) H b k r T(\ r ) (1) 


The following theorem is an analog of parts (a) and ( d ) of Theorem 1.8.2. 


IfT : V — > W is a linear transformation, then : 

(а) 7(0) = 0. 

(б) T(u — v) = T(u) — 7Xv) for all u and v in V. 


Use the two parts of Theorem 
8.1.1 to prove that 

T(—y) = —T(y) 

for all v in V. 


Proof Let u be any vector in V. Since Ou = 0, it follows from the homogeneity property 
in Definition 1 that 

T(O) = 7X0u) = 07Xu) = 0 

which proves (a). 

We can prove part ( b ) by rewriting 7Tu — v) as 

T(u — y) = T( u+(— l)v) 

= 7Tu) + (-l)7Xv) 

= 7’(u) - Tty) 


We leave it for you to justify each step. 


► EXAMPLE 1 Matrix Transformations 

Because we have based the definition of a general linear transformation on the homo- 
geneity and additivity properties of matrix transformations, it follows that every matrix 
transformation Ta '■ R" —> R' n is also a linear transformation in this more general sense 
with V = R " and W = R m . 


► EXAMPLE 2 The Zero Transformation 

Let V and W be any two vector spaces. The mapping T:V -*W such that T(v) = 0 for 
every v in V is a linear transformation called the zero transformation. To see that T is 
linear, observe that 

7Xu + v) = 0, 7Tu) = 0, T(y) = 0, and TXitu) = 0 


Therefore, 


7Xu + v) = T(u) + T(v) and T(ku) = kT(u ) 


► EXAMPLE 3 The Identity Operator 

Let V be any vector space. The mapping 7: V — > V defined by 7(v) = v is called the 
identity operator on V. We will leave it for you to verify that 7 is linear. 
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► EXAMPLE 4 Dilation and Contraction Operators 

If V is a vector space and k is any scalar, then the mapping T : V -> V given by T(x) = kx 
is a linear operator on V, for if c is any scalar and if u and v are any vectors in V, then 

T(cu) = k(cu) = c(k u) = cT(u) 

T( u + v) = k{ u + v) = ku + k\ - T(u) + T(\) 

IfO < k < 1, then T is called the cont faction of V with factor k, and if k > 1, it is called 
the dilation of V with factor k. 

► EXAMPLE 5 A Linear Transformation from P n to P„+i 

Let p = p{x ) = Co + c\x + ■ ■ ■ + c n x" be a polynomial in P n , and define the transfor- 
mation T : P n -* P n+ 1 by 

T(p) = T(p(x)) = xp(x) = cqx + c\x 2 H b c„x n+l 

This transformation is linear because for any scalar k and any polynomials pj and p 2 in 
P n we have 

T(Lp) = T(kp(x)) = x(kp(x)) — k{xp{x)) = kT{ p) 

and 

Lip, + p 2 ) = T(p\(x) + p 2 (x)) = x(p i(x) + p 2 (x)) 

= xpi(x) +xp 2 (x) = r(pj) + r(p 2 ) 

► EXAMPLE 6 A LinearTransformation Using the Dot Product 

Let vo be any fixed vector in R", and let T : R" -> R be the transformation 

T(x) = (x • v 0 > 

that maps a vector x to its dot product with vo. This transformation is linear, for if k is 
any scalar, and if u and v are any vectors in R’\ then it follows from properties of the dot 
product in Theorem 3.2.2 that 

T {ku) = (£u) • Vo = L(u • vo) = kT (u) 

T(u + v) = (u + v) • v 0 = (u • v 0 ) + (v • v 0 ) = T (u) + T (v) 

► EXAMPLE 7 Transformations on Matrix Spaces 

Let M nn be the vector space of n x n matrices. In each part determine whether the 
transformation is linear. 

(a) T 1 {A) = A t (b) T 2 {A) = det(A) 

Solution (a) It follows from parts ( b ) and ( d ) of Theorem 1.4.8 that 

Ti(kA) = {kA) T = kA T = kT\{A) 

T\(A + B) = {A + B) t = A t + B t = Ti{A) + T t (B) 

so T\ is linear. 

Solution {h} It follows from Formula (1) of Section 2.3 that 

T 2 (kA) = det (kA) = k n det(A) = k n T 2 {A) 

Thus, T 2 is not homogeneous and hence not linear if n > 1 . Note that additivity also fails 
because we showed in Example 1 of Section 2.3 that det(A + B) and det(A) + det(5) 
are not generally equal. 
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▲ Figure T(x) = x + x 0 
translates each point x along a 
line parallel to x 0 through a 
distance ||x 0 ||. 


► EXAMPLE 8 Translation Is Not Linear 

Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property 
is useful for identifying transformations that are not linear. For example, if x 0 is a fixed 
nonzero vector in R 2 , then the transformation 

T(x) = x + x 0 

has the geometric effect of translating each point x in a direction parallel to x 0 through a 
distance of || x 0 1| (Figure 8.1.1). This cannot be a linear transformation since T(0) = x 0 , 
so T does not map 0 to 0. 


► EXAMPLE 9 The Evaluation Transformation 

Let V be a subspace of F(— oo, oo), let 


Xi,X2,...,X„ 

be a sequence of distinct real numbers, and let T : V — > R' 1 be the transformation 

nf) = (m),f(x 2 ),...,f(x n )) (2) 

that associates with / the w -tuple of function values at x\, x 2 , . . . , x„. We call this the 
evaluation transformation on V at xi, x 2 , . . . , x n . Thus, for example, if 

xi = — 1, x 2 = 2, X3 = 4 

and if fix) = x 2 — 1, then 

T(f) = (/(xi), f(x 2 ), f(x 2 )) = (0, 3, 15) 

The evaluation transformation in (2) is linear, for if k is any scalar, and if / and g 
are any functions in V, then 

T(kf) = ((*/)(*!), (kf)(x 2 ), . . . , (*/)(*„)) 

= (kf(x i), kf(x 2 ), . . . , kf(x„)) 

= k(f(x i), f(x 2 ), f(x n )) = kT(f ) 

and 

T(f + g) = ((/ + g)(xO, (/ + g)(x 2 ), ...,(/ + g)(x„)) 

= {fix l) + g(xi), f(x 2 ) + g(x 2 ), f(x„) + g(x„)) 

= {fix l), fix 2 ), . . . , fix n )) + (g(X|), g(x 2 ), g(x n )) 

= T(f) + T(g) ◄ 


Finding Linear 
Transformations from 
Images of Basis Vectors 


We saw in Formula (15) of Section 1.8 that if T A : R n — »■ R m is multiplication by A, and 
if e 1; e 2 , . . . , e„ are the standard basis vectors for R n , then A can be expressed as 

A = (T( ei ) | T(e 2 ) | ■■• | T(e„)] 


It follows from this that the image of any vector v = (ci, c 2 c„) in R" under multi- 

plication by A can be expressed as 


T/i(v) — ciTyifei) + c 2 T A ( e 2 ) + ■ • • + c n T A (e n ) 

This formula tells us that for a matrix transformation the image of any vector is express- 
ible as a linear combination of the images of the standard basis vectors. This is a special 
case of the following more general result. 
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[EM 8.1. Let T :V — > W be a linear transformation, where V is finite-dimen- 
sional. If S — {vi, v 2 , . ■ ■ , v„} is a basis for V, then the image of any vector v in V can 
be expressed as 

T(y) = ci T(n) + c 2 T(\ 2 ) + ■ ■ ■ + c„ T(y n ) (3) 

where ci,c 2 , , c n are the coefficients required to express v as a linear combination of 
the vectors in the basis S. 


Proof Express v as v = CiVi + c 2 \ 2 + ■ ■ ■ + c„v„ and use the linearity of T. 


► EXAMPLE 10 Computing with Images of Basis Vectors 

Consider the basis S — {vi, \ 2 , V3} for R 3 , where 

Vl = (1,1,1), V 2 = (1,1.0), v 3 = (1,0,0) 
Let T : R 3 — > R 2 be the linear transformation for which 


7^) = (1,0), T(\ 2 ) = (2, -1), 7(v 3 ) = (4,3) 

Find a formula for T(x \ , x 2 , X3), and then use that formula to compute T( 2, —3, 5). 

Solution We first need to express x = (xi , x 2 , xf) as a linear combination of vi , V 2 , and 
V3 . If we write 

(x 1 , x 2 , x 3 ) = ci (1, 1, 1) + c 2 (l, 1, 0) + c 3 (l, 0, 0) 
then on equating corresponding components, we obtain 

Cl + c 2 + C 3 = Xl 
Cl + c 2 = x 2 

Cl = x 3 

which yields ci = X3, c 2 = x? — x 3 , C3 = xi — x 2 , so 

(xi, x 2 , x 3 ) = x 3 (l, 1, 1) + (x 2 - x 3 ) ( 1 , 1, 0) + (xi - x 2 ) (1 , 0, 0) 

= X3V1 + (x 2 - x 3 )v 2 + (xi - x 2 )v 3 

Thus 

T(xi, x 2 , x 3 ) = x 3 T(vi) + (x 2 - x 3 )T(v 2 ) + (xi - x 2 )T(v 3 ) 

= x 3 ( 1 , 0) + (x 2 - x 3 )(2, -1) + (x 1 - x 2 )(4, 3) 

= ( 4 x! — 2 x 2 — X3, 3xi — 4x 2 + X3) 

From this formula we obtain 

7(2, -3, 5) = (9, 23) 


calculus required EXAMPLE 11 A Linear Transformation from C 1 (— oo , oo) to F(— oo, <») 

Let V = C 1 (— o o, oo) be the vector space of functions with continuous first derivatives on 
(—oo, oo),andletW = F(—o o, oo) be the vector space of all real-valued functions defined 
on (—oo, oo). Let D: V —> W be the transformation that maps a function f = /(x) into 
its derivative — that is, 

D(f) = f(x) 

From the properties of differentiation, we have 

7>(f+g) = 7>(f) + 7>(g) and D(kf) = kD(f) 

Thus, D is a linear transformation. 
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CALCULUS REQUIRED 


Kernel and Range 


► EXAMPLE 12 An Integral Transformation 


Let V — C(— oo, oo) be the vector space of continuous functions on the interval (— oo, oo), 
let W = C 1 (— oo, oo) be the vector space of functions with continuous first derivatives on 
(—oo, oo), and let J : V -* W be the transformation that maps a function f mV into 


j{f)= [ mdt 
Jo 

For example, if f(x) = x 2 , then 

p* t 3 

/(/) = / t 2 dt = - 

The transformation J : V — > W is linear, for if k is any constant, and if / and g are any 
functions in V, then properties of the integral imply that 



j(kf) = [ km dt = k [ m dt = kj(f ) 
Jo Jo 


-L 


j{f+g)= / (m + s(t))dt 


= f fO)dt+ [ g(t)dt = J(f) + J(g ) ◄ 
Jo Jo 


Recall that if A is an m x n matrix, then the null space of A consists of all vectors x 
in R" such that Ax = 0, and by Theorem 4.7.1 the column space of A consists of all 
vectors b in R m for which there is at least one vector x in R" such that Ax = b. From 
the viewpoint of matrix transformations, the null space of A consists of all vectors in R n 
that multiplication by A maps into 0, and the column space of A consists of all vectors in 
R m that are images of at least one vector in R" under multiplication by A. The following 
definition extends these ideas to general linear transformations. 


DEFINITION 2 If T : V — > W is a linear transformation, then the set of vectors in V 
that T maps into 0 is called the kernel of T and is denoted by ker(T). The set of all 
vectors in W that are images under T of at least one vector in V is called the range of 
T and is denoted by R(T). 


'S' : EXAMPLE 13 Kernel and Range of a Matrix Transformation 

If Ta'. R" -» R m is multiplication by the m x n matrix A, then, as discussed above, the 
kernel of T A is the null space of A, and the range of T A is the column space of A. 

► EXAMPLE 14 Kernel and Range of the Zero Transformation 

Let T : V —> W be the zero transformation. Since T maps every vector in V into 0, it 
follows that ker(T) = V. Moreover, since 0 is the only image under T of vectors in V, it 
follows that R(T) = {0}. 

► EXAMPLE 15 Kernel and Range of the Identity Operator 

Let /: V -» V be the identity operator. Since 7(v) = v for all vectors in V, every vector 
in V is the image of some vector (namely, itself); thus R(I) = V. Since the only vector 
that I maps into 0 is 0 , it follows that ker(7) = { 0 }. 

► EXAMPLE 16 Kernel and Range of an Orthogonal Projection 

Let T : R 2 — > R 3 be the orthogonal projection onto the xy-plane. As illustrated in Fig- 
ure 8.1.2a, the points that T maps into 0 = (0, 0, 0) are precisely those on the z-axis, so 
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ker(T) is the set of points of the form (0, 0, z). As illustrated in Figure 8.1.26, T maps 
the points in R 2 to the xy-plane, where each point in that plane is the image of each point 
on the vertical line above it. Thus, R(T) is the set of points of the form (x, y, 0). 



► Figure 8.1.2 


I ( 0 , o, Z ) 



(a) ker(7) is the z-axis. 



EXAMPLE 17 Kernel and Range of a Rotation 

Let T : R 2 -+ R 2 be the linear operator that rotates each vector in the xy-plane through 
the angle 9 (Figure 8.1.3). Since every vector in the xy-plane can be obtained by rotating 
some vector through the angle 9. it follows that R(T ) = R 2 . Moreover, the only vector 
that rotates into 0 is 0, so ker(T) = {0}. 


calculus required I EXAMPLE 18 Kernel of a Differentiation Transformation 

Let V — C 1 (— oo, oo) be the vector space of functions with continuous first derivatives on 
(—00, 00), let W = F(— 00, 00) be the vector space of all real-valued functions defined on 
(—00, 00), and let D: V — > W be the differentiation transformation D(f) = /'(x). The 
kernel of D is the set of functions in V with derivative zero. From calculus, this is the 
set of constant functions on (— oo, oo). ^ 

Properties of Kernel and In all of the preceding examples, ker(T) and R ( T ) turned out to be subspaces. In 
Range Examples 14, 1 5, and 17 they were either the zero subspace or the entire vector space. In 
Example 16 the kernel was a line through the origin, and the range was a plane through 
the origin, both of which are subspaces of R 2 . All of this is a consequence of the following 
general theorem. 


IEM 8.1.3 IfT : V — > W is a linear transformation , them. 

(a) The kernel of T is a subspace of V. 

(b) The range ofT is a subspace of W. 


Proof (a) To show that ker(T) is a subspace, we must show that it contains at least 
one vector and is closed under addition and scalar multiplication. By part (a) of Theo- 
rem 8.1.1, the vector 0 is in ker(T), so the kernel contains at least one vector. Let vi and 
\2 be vectors in ker(T), and let k be any scalar. Then 

T(V| + v 2 ) = T(vi) + T(v 2 ) = 0 + 0 = 0 
so vi + v 2 is in ker(T). Also, 

Tikvf) = kTiyf) = 60 = 0 


so 6vi is in ker(T). 
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Proof (b) To show that R(T) is a subspace of W. we must show that it contains at least 
one vector and is closed under addition and scalar multiplication. However, it contains 
at least the zero vector of W since T( 0) = (0) by part (a) of Theorem 8.1.1. To prove 
that it is closed under addition and scalar multiplication, we must show that if wi and 
w 2 are vectors in R(T), and if k is any scalar, then there exist vectors a and b in V for 
which 

T{ a) = wi + w 2 and T(b) = k\Y\ (4) 

But the fact that wi and w 2 are in R(T) tells us there exist vectors Vi and v 2 in V such 
that 

T(vi) = wi and T(v 2 ) = w 2 

The following computations complete the proof by showing that the vectors a = Vi + v 2 
and b = kv \ satisfy the equations in (4): 

T(a) = 7’(v, + v 2 ) = T(V|) + T(y 2 ) = w, + w 2 
T(b) = T(k\ j) = kT(y\) = &wi 


calculus required EXAMPLE 19 Application to Differential Equations 

Differential equations of the form 

y" + co 2 y = 0 (co a positive constant) (5) 

arise in the study of vibrations. The set of all solutions of this equation on the interval 
(— 00 , oo) is the kernel of the linear transformation D : C 2 (— oo, co ) — > C (— 00 , 00 ), given by 

D(y) = y" + ca 2 y 

It is proved in standard textbooks on differential equations that the kernel is a two- 
dimensional subspace of C 2 (— 00 , co), so that if we can find two linearly independent 
solutions of (5), then all other solutions can be expressed as linear combinations of those 
two. We leave it for you to confirm by differentiating that 

y\ = cos cox and y 2 = sin cox 

are solutions of (5). These functions are linearly independent since neither is a scalar 
multiple of the other, and thus 

y = c 1 cos cox + c 2 sin cox (6) 

is a “general solution” of (5) in the sense that every choice of c 1 and c 2 produces a 
solution, and every solution is of this form. 


Rank and Nullity of Linear In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an m x n 
Transformations matrix, and in Theorem 4.8.2, which we called the Dimension Theorem for Matrices, we 
proved that the sum of the rank and nullity is n. We will show next that this result is 
a special case of a more general result about linear transformations. We start with the 
following definition. 


DEFINITION 3 Let T : V — > W be a linear transformation. If the range of T is finite- 
dimensional, then its dimension is called the rank of T ; and if the kernel of T is 
finite-dimensional, then its dimension is called the nullity of T . The rank of T is 
denoted by rank(T) and the nullity of T by nullity(T). 


The following theorem, whose proof is optional, generalizes Theorem 4.8.2. 
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OPTIONAL 


Dimension Theorem for Linear Transformations 

IfT:V->-W is a linear transformation from a finite-dimensional vector space V to a 
vector space W, then the range ofT is finite-dimensional, and 

rank(T) + nullity(T) = dim(F) (7) 


In the special case where A is an m x n matrix and T A : R " — »• R m is multiplication 
by A, the kernel of T A is the null space of A, and the range of T A is the column space of 
A. Thus, it follows from Theorem 8.1.4 that 

rank(T A ) + nullity(Tx) = n 


Proof of Theorem 8.1.4 Assume that V is n -dimensional. We must show that 

dim(/?(T)) + dimlkerlT)) = n 

We will give the proof for the case where 1 < dim(ker(T)) < n. The cases where 
dim(ker(T)) = 0 and dim(ker(T)) = n are left as exercises. Assume dim(ker(7’)) = r, 
and let V], , v r be a basis for the kernel. Since {v t , . . . , v r } is linearly independent, 
Theorem 4. 5.5(6) states that there are n — r vectors, v r+ i , such that the extended 
set {vi, . . . , v r , v r+ i, . . . , v„} is a basis for V. To complete the proof, we will show that 
the n — r vectors in the set S — {T(v r+ i), . . . , T(v„)} form a basis for the range of T . It 
will then follow that 


dim(R(T)) + dim(ker(T)) = (n — r) + r = n 

First we show that S spans the range of T . If b is any vector in the range of T, then 
b = T(y) for some vector v in V. Since {vi, . . . , v r , v r +i, . . . , v„} is a basis for V, the 
vector v can be written in the form 


V = ClVi H b C r \y + Cr+lVr+l H b C„V„ 

Since vi, . . . , \ r lie in the kernel of T, we have T(vi) = • • • = T(y r ) = 0, so 

b = T(V) = Cr+lT(Vr+l) H b c n T(v n ) 

Thus S spans the range of T . 

Finally, we show that S is a linearly independent set and consequently forms a basis 
for the range of T . Suppose that some linear combination of the vectors in S is zero; 
that is, 

k r+l T(\ r+l ) H b k n T{y n ) = 0 (8) 

We must show that k r+ \ = ••• = &„= 0. Since T is linear, (8) can be rewritten as 


T(k r+l \ r+l H b k n y n ) = 0 


which says that k r+ \y r+ \ H + k n y n is in the kernel of T. This vector can therefore be 

written as a linear combination of the basis vectors {vi , . . . , v r }, say 


^r+iVr+i + • • • + k n y n — Aqvi + • • • + k r y r 


Thus, 

fcivi H b k r y r - k r+ iv r+ i k n y n = 0 

Since {vi , . . . , v„ } is linearly independent, all of the k’s are zero; in particular, 
k r+ 1 = • • • = k n = 0, which completes the proof. 
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Exercise Set 8.1 


In Exercises 1-2, suppose that T is a mapping whose domain 
is the vector space M 22 . In each part, determine whether T is a 
linear transformation, and if so, find its kernel. 

1. (a) T(A) = A 2 (b) T(A) = tr(A) 

(c) T(A) = A + A T 

2. (a) T(A) = (A)„ (b) T(A) = 0 2x2 

(c) T(A) = cA 

In Exercises 3-9, determine whether the mapping T is a linear 
transformation, and if so, find its kernel. 

3. T:R 3 ^R, where T(u) = ||u||. 

4. T : R 3 — »■ R 3 , where v 0 is a fixed vector in R 3 and 
T(u) = u x v 0 . 

5. T : M 22 — > M 23 , where B is a fixed 2x3 matrix and 
T(A) = AB. 


6. T : M11 — »■ R , where 


(a) r ( 

(b) r ( 



= 3a — 4b + c — cl 


= a 2 + b 2 


7. T : P 2 — ► P 2 , where 

(a) T(a 0 + a ,x + a 2 J r) = a 0 + a 3 (x + 1) + a 2 (x + l) 2 

(b) T(a 0 + cijx + a 2 x 2 ) 

= (rto + 1) ~t~ (ui + l)x -f- (rt 2 + l)x 2 

8. T : F{— 00, 00)— >F(— 00, 00), where 

(a) T(f(x)) = 1 + f(x ) (b) 7\/(x)) = /(x + 1) 

9. T : 2?” ->/?“, where 

. • . , . . .) = (0, Uo, fli, i? 2 , . . . , u n , . . .) 

10. Let T : P 2 ^- P 2 be the linear transformation defined by 
T(p(x)) — xp{x). Which of the following are in kerlT)? 

(a) x 2 (b) 0 (c) 1 + x (d) — x 

11. Let T: P 2 —>P 3 be the linear transformation in Exercise 10. 
Which of the following are in R(T) r ! 

(a) x + x 2 (b) 1 + x (c) 3 — x 2 (d) — x 

12. Let V be any vector space, and let T : V — > V be defined by 

T(y) = 3v. 

(a) What is the kernel of T2 

(b) What is the range of T1 

13. In each part, use the given information to find the nullity of 
the linear transformation T . 

(a) T : R 5 -*■ P 5 has rank 3. 

(b) T : P 4 -+ P 2 has rank 1. 


(c) The range of T : M mn — > R 3 is R 3 . 

(d) T : M 22 — >• M 22 has rank 3. 

14. In each part, use the given information to find the rank of the 
linear transformation T. 

(a) T : R 1 — > M 32 has nullity 2. 

(b) T : Pi — > R has nullity 1. 

(c) The null space of T : P^—y P 5 is P5. 

(d) T : P n — > M mn has nullity 3. 

15. Let T : M 22 -> M 22 be the dilation operator with factor k = 3. 

(\ 1 2 

(a) Find T I 

4 3 

(b) Find the rank and nullity of T. 

16. Let T : P 2 — * P 2 be the contraction operator with factor 
k = 1/4. 

(a) Find T( 1 + 4x + 8x 2 ). 

(b) Find the rank and nullity of T. 

17. Let T : //> — >■ R 3 be the evaluation transformation at the se- 
quence of points —1,0, 1. Find 

(a) T{x 2 ) (b) ker(J) (c) R{T) 

18. Let V be the subspace of C[0, 27r] spanned by the vectors 1, 
sinx, and cosx, and let T : V — »■ R 3 be the evaluation trans- 
formation at the sequence of points 0, n, 2n . Find 

(a) T(1 + sinx + cosx) (b) ker(T) 

(c) R(T) 

19. Consider the basis 5" = {vj , v 2 ) for R 2 , where v 3 = (1, 1) and 
v 2 = (1,0), and let T : R 2 — >■ R 2 be the linear operator for 
which 

2Xvi) = (l,-2) and T(v 2 ) = (-4, 1) 

Find a formula for T(x \ , x 2 ), and use that formula to find 
T( 5, -3). 

20. Consider the basis 5 = (vi , v 2 ) for R 2 , where v, = (—2, l)and 
v 2 = (1, 3), and let T : R 2 R 3 be the linear transformation 
such that 

r(vj) = (-1,2, 0) and T(v 2 ) = (0, -3, 5) 

Find a formula for T(x \ , x 2 ), and use that formula to find 
T{ 2, -3). 

21. Consider the basis S = (vi, v 2 , v 3 ) for R 3 , where 

Vi = (1, 1. 1), v 2 = (1, 1, 0), and v 3 = (1, 0, 0), and let 
T : R 3 — > R 3 be the linear operator for which 

2Xv 1 ) = (2,-l,4), T(v 2 ) = (3, 0, 1), 
r(v 3 ) = (-i,5, i) 

Find a formula for T(x \ , x 2 , x 3 ), and use that formula to find 
T( 2,4, -1). 
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22. Consider the basis 5 = (vi, v 2 , v 3 ) for R 3 , where 

vi = (1,2,1), v 2 = (2,9,0), and v 3 = (3, 3, 4), and let 
T : R 3 —*■ R 2 be the linear transformation for which 

r(v 1 ) = (i,o), r(v 2 ) = (-1, i), r(y 3 ) = (o, i) 

Find a formula for T(x \ , jc 2 , x 3 ), and use that formula to find 
T(7, 13, 7). 

23. Let T:P 3 ^P 2 be the mapping defined by 

T(a 0 + a\X + a 2 x 2 + a 3 x 3 ) = 5a 0 + a 3 x 2 


30. In each part, determine whether the mapping T : P n —> P n is 
linear. 

(a) T(p(x )) = p(x + 1) 

(b) T{p(x)) = p{x) + 1 

31. Let vi, v 2 , and v 3 be vectors in a vector space V, and let 
T: V —> R 3 be a linear transformation for which 

T(vi)= (1,-L 2), r(v 2 ) = (0, 3, 2), 

T(v 3 ) = (-3, 1,2) 

Find T(2x l — 3v 2 + 4v 3 ). 


(a) Show that T is linear. 

(b) Find a basis for the kernel of T . 

(c) Find a basis for the range of T . 

24. Let T : P 2 -> P 2 be the mapping defined by 

T{ao d- a\x d- ci 2 x ) = 3cto d- ci\x d - (do d - Ui)x 

(a) Show that T is linear. 

(b) Find a basis for the kernel of T. 

(c) Find a basis for the range of T . 

25. (a) ( Calculus required) Let I) : P 3 —*■ P 2 be the differentiation 

transformation Dtp) = p'(x). What is the kernel of D? 

(b) ( Calculus required) Let J : P { ^- R be the integration trans- 
formation 7(p) = f _ j p(x) dx. What is the kernel of 7? 


Working with Proofs 

32. Let { Vi , v 2 , . . . , v„) be a basis for a vector space V, and let 
T : V — > W be a linear transformation. Prove that if 

iXvO = r(v 2 ) = ■ ■ ■ = r(v„) = o 

then T is the zero transformation. 

33. Let {vi, v 2 , . . . , v„) be a basis for a vector space V, and let 
T : V — >• V be a linear operator. Prove that if 

T(y i) = Vi, T(v 2 ) = v 2 , . . . , T(y„) = v„ 

then T is the identity transformation on V. 

34. Prove: If {vi, v 2 v„) is a basis for a vector space V and 

Wi , w 2 , . . . , w„ are vectors in a vector space W, not necessarily 
distinct, then there exists a linear transformation T : V — > W 
such that 

T(\ i) = Wi, T(\ 2 ) = w 2 , , T(y„) = w„ 


26. ( Calculus required) Let V = C[a, b] be the vector space of 
continuous functions on [a,b\, and let T : V — > V be the trans- 
formation defined by 

r(f) = 5/(x) + 3 [ f(t)dt 

J a 

Is T a linear operator? 

27. {Calculus required) Let V be the vector space of real-valued 
functions with continuous derivatives of all orders on the in- 
terval (—oo, co), and let IV = F{— oo, oo) be the vector space of 
real-valued functions defined on (— oo, oo). 

(a) Find a linear transformation T : V — > W whose kernel 
is P 3 . 


True-False Exercises 

TF. In parts (a)-(i) determine whether the statement is true or 

false, and justify your answer. 

(a) If TfciV! d- c 2 v 2 ) = ci T(\i) + c 2 T(y 2 ) for all vectors vj and v 2 
in V and all scalars Ci and c 2 , then T is a linear transformation. 

(b) If v is a nonzero vector in V, then there is exactly one linear 
transformation T : V — > W such that T(—y) = —T(y). 

(c) There is exactly one linear transformation T: V W for 
which T( u d- v) = T( u — v) for all vectors u and v in V. 

(d) If Vo is a nonzero vector in V, then the formula T{\) = Vo d- v 
defines a linear operator on V. 


(b) Find a linear transformation T :V ->-W whose kernel 
is P n . 

28. For a positive integer n > 1, let T: M nn — > R be the linear 
transformation defined by T (A) = tr( A), where A is an n x n 
matrix with real entries. Determine the dimension of ker(7'). 

29. (a) Let T : V — > R 3 be a linear transformation from a vector 

space V to R 3 . Geometrically, what are the possibilities 
for the range of Tl 

(b) Let T : R 3 — »■ W be a linear transformation from R 3 to a 
vector space W. Geometrically, what are the possibilities 
for the kernel of T? 


(e) The kernel of a linear transformation is a vector space. 

(f ) The range of a linear transformation is a vector space. 

(g) If T : P(, — > M 22 is a linear transformation, then the nullity of 
Tis3. 

(h) The function T : M 22 —> R defined by T(A) = det A is a linear 
transformation. 

(i) The linear transformation T : M 22 ->• M 22 defined by 

'1 31 


T(A) = 


2 6 


has rank 1 . 
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8.2 Compositions and InverseTransformations 

In Section 4.10 we discussed compositions and inverses of matrix transformations. In this 
section we will extend some of those ideas to general linear transformations. 

One-to-One and Onto To set the groundwork for our discussion in this section we will need the following 
definitions that are illustrated in Figure 8.2.1. 


DEFINITION 1 If T: V — > W is a linear transformation from a vector space V to a 
vector space W, then T is said to be one-to-one if T maps distinct vectors in V into 
distinct vectors in W. 


DEFINITION 2 If T: V — > W is a linear transformation from a vector space V to a 
vector space W, then T is said to be onto (or onto W) if every vector in W is the image 



of at least one vector in V. 
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w v w v 

w 
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Range 
of T 

Range 
of T 


Onto W. Every vector in 
W is the image of some 
vector in V. 


A Figure 8.2.1 


One-to-one. Distinct 
vectors in Fhave 
distinct images in W. 


Not one-to-one. There 
exist distinct vectors in 
Kwith the same image. 


Not onto W. Not every 
vector in W is the image 
of some vector in V. 


IfT: V ->W is a linear transformation , then the following statements 

are equivalent. 

(a) T is one-to-one. 

(h) ker(T) = {0}. 


Proof (a) =>■ (h) Since T is linear, we know that 7’(0) = 0 by Theorem 8.1.1(a). Since T 
is one-to-one, there can be no other vectors in V that map into 0, so ker(T) = {0}. 

(b) => (a) Assume that ker(T) = {0}. If u and v are distinct vectors in V, then 
u - v ^ 0. This implies that T(u — v) ^ 0, for otherwise ker(T) would contain a 
nonzero vector. Since T is linear, it follows that 

T(u) - 7'(v) = T(u - v) / 0 

so T maps distinct vectors in V into distinct vectors in W and hence is one-to-one. 

In the special case where V is finite-dimensional and T is a linear operator on V, 
then we can add a third statement to those in Theorem 8.2.1. 
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THEOREM 8.2.2 If V and W are finite-dimensional vector spaces with the same di- 
mension, and ifT:V->W is a linear transformation, then the following statements are 
equivalent. 

(a) T is one-to-one. 

(b) ker(T) = { 0 }. 

(c) T is onto [i.e., R(T ) = W], 

Proof We already know that (a) and ( b ) are equivalent by Theorem 8.2.1, so it suffices 
to show that ( b ) and (c) are equivalent. We leave it for you to do this by assuming that 
dim(V) = n and applying Theorem 8.1.4. 

The requirement in Theorem 8.2.2 that V and W have the same dimension is essential 
for the validity of the theorem. In the exercises we will ask you to prove the following 
facts for the case where they do not have the same dimension. 

If dim(W) < dim(T), then T cannot be one-to-one. 

If dim(T) < dim(W), then T cannot be onto. 

Stated informally, if a linear transformation maps a “bigger” space to a “smaller” space, 
then some points in the “bigger” space must have the same image; and if a linear trans- 
formation maps a “smaller” space to a “bigger” space, then there must be points in the 
“bigger” space that are not images of any points in the “smaller” space. 

► EXAMPLE 1 Matrix Transformations 

If Ta : R n — > R' n is multiplication by an m x n matrix A , then it follows from the foregoing 
discussion that T A is not one-to-one if m < n and is not onto if n < m. In the case where 
m — n we know from Theorem 4.10.2 that T A is both one-to-one and onto if and only 
if A is invertible. 

► EXAMPLE 2 Basic Transformations That Are One-to-One and Onto 

The linear transformations T\ : — >■ R 4 and 'A: M 22 -> R 4 defined by 


T\ (a + bx + cx 2 + dx 3 


) = (a, b, c, d) 



are both one-to-one and onto (verify by showing that their kernels contain only the zero 
vector). 

► EXAMPLE 3 A One-to-One Linear Transformation That Is Not Onto 

Let T : P n -> P n+l be the linear transformation 

T(p) = T(p(x)) = xp(x ) 

discussed in Example 5 of Section 8.1. If 

p = p(x) = Co + c \x + ■ ■ ■ + c n x n and q = q(x ) = do + d\x + • • • + d n x n 

are distinct polynomials, then they differ in at least one coefficient. Thus, 

T(p) = c 0 x + c\x 2 + h c n x n+x and T(q) = d 0 x + d\X 2 + • • • + d„x n+1 

also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct 
polynomials p and q into distinct polynomials T(p) and 7'(q). However, it is not onto 
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Why does Example 4 not vio- 
late Theorem 8.2.2? 


CALCULUS REQUIRED 


Composition of Linear 
Transformations 


Note that the word “with” es- 
tablishes the order of the oper- 
ations in a composition. The 
composition of T 2 with 7) is 

(r 2 o7i)(u) = r 2 (r,(u)) 

whereas the composition of 7) 
with 7) is 

(?) o 73)00 = 7 i(r 2 (u)) 

It is not true, in general, that 
7) ° 73 = 73 o 7). 


because all images under T have a zero constant term. Thus, for example, there is no 
vector in P„ that maps into the constant polynomial 1 . 

E EXAMPLE 4 Shifting Operators 

Let V = R x be the sequence space discussed in Example 3 of Section 4.1, and consider 
the linear “shifting operators” on V defined by 

T\{u\, U 2 , ■ ■ ■ , u„, . . .) = (0, Ml, U 2 , . . . , U n , . . .) 

T 2 (u\, u 2 , , u„, ...) = (u 2 , I< 3 , . . . , u n , . . .) 

(a) Show that T\ is one-to-one but not onto. 

(b) Show that 73 is onto but not one-to-one. 

Solution (a) The operator T\ is one-to-one because distinct sequences in R J obviously 
have distinct images. This operator is not onto because no vector in /?“ maps into the 
sequence (1, 0, 0, . . . , 0, . . .), for example. 

Solution (b) The operator 73 is not one-to-one because, for example, the vectors 
(1, 0, 0, . . . , 0, . . .) and (2, 0, 0, . . . , 0, . . .) both map into (0, 0, 0, . . . , 0, . . .). This 
operator is onto because every possible sequence of real numbers can be obtained with 
an appropriate choice of the numbers u 2 ,u 2 , . . . , u n , 

► EXAMPLES Differentiation Is Not One-to-One 

Let 

D: C'(— 00 , 00 ) — > F(— 00 , 00 ) 

be the differentiation transformation discussed in Example 1 1 of Section 8.1. This linear 
transformation is not one-to-one because it maps functions that differ by a constant into 
the same function. For example, 

D(x 2 ) = D(x 2 + 1) = 2x 


The following definition extends Formula (1) of Section 4.10 to general linear transfor- 
mations. 


DEFINITION 3 If 7): U — > V and 73 : V —> W are linear transformations, then the 
composition ofT 2 with Tj, denoted by 73 o T\ (which is read “73 circle 7V’), is the 
function defined by the formula 

(73o73)(u) = 73(73 (u)) (1) 

where u is a vector in U. 


Remark Observe that this definition requires that the domain of 7) (which is V ) contain the 
range of 7). This is essential for the formula TVIKu)) to make sense (Figure 8.2.2). 


7307) 



U V w 

▲ Figure 8.2.2 The composition of 73 with 7). 
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Our next theorem shows that the composition of two linear transformations is itself 
a linear transformation. 


THEOREM 8.2.3 If Tj : U — > V and T 2 :V — > W are linear transformations, then 
(T 2 o T\)\ U — y W is also a linear transformation. 


Proof If u and v are vectors in U and c is a scalar, then it follows from ( 1 ) and the 
linearity of T\ and T 2 that 

(T 2 o Ti) (u + v) = T 2 (Tfu + v)) = T 2 (f(u) + f (v)) 

= TfTfu)) + T 2 (Tfv)) 

= (T 2 o 7i)(u) + (T 2 o 7'i)(v) 


and 


(T 2 o 7i)(cu) = TiCTiCcu)) = T 2 (cTf u)) 

= cT 2 (Tf u)) = c(T 2 o 7i)(u) 

Thus, T 2 o T\ satisfies the two requirements of a linear transformation. 


^ EXAMPLE 6 Composition of LinearTransformations 

Let T\\P\ — >■ P 2 and T 2 : P 2 —> P 2 be the linear transformations given by the formulas 

Ti(p(x)) = xp(x) and T 2 (p(x )) = p(2x + 4) 

Then the composition ( T 2 o T\)\ P\ — > P 2 is given by the formula 

(T 2 o Ti)(p(x)) = TfTfpix))) = T 2 (xp(x)) = (2x + 4)p(2x + 4) 

In particular, if p(x) = cq + C\x, then 

(T 2 o Tfipix)) = (T 2 o Ti)(c 0 + cix) = (2x + 4)(c 0 + Ci(2x + 4)) 

= c 0 ( 2x + 4) + Cl (2x + 4) 2 


► EXAMPLE 7 Composition with the Identity Operator 

If T : V — > V is any linear operator, and if /: V — »■ V is the identity operator (Example 3 
of Section 8.1), then for all vectors v in V, we have 

(T o 7)(v) = T(I (v)) = T(\) 

(I o T)(v) = / (T(v)) = T(v) 

It follows that To/ and I oT are the same as T ; that is, 

T o I = T and / o T = T ◄ (2) 


As illustrated in Figure 8.2.3, compositions can be defined for more than two linear 
transformations. For example, if 

Tp.U^V, T 2 : V —>W, and Ty.W^Y 

are linear transformations, then the composition T 2 o T 2 o T\ is defined by 


(T 3 o T 2 o 7T)(u) = T 3 (T 2 (Tfu))) 


(3) 
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Inverse Linear 
Transformations 


(^oTj.rjX u) 



▲ Figure 8.2.3 The composition of three linear transformations. 

In Theorem 4.10.1 we showed that a matrix operator T A : R n — > R n is one-to-one if and 
only if the matrix A is invertible, in which case the inverse operator is T A -\ . We then 
showed that if w is the image of a vector x under the operator T A , then x is the image 
under T A -< of the vector w (see Figure 4.10.8). Our next objective is to extend the notion 
of invertibility to general linear transformations. 

If T: V W is a one-to-one linear transformation with range R(T), and if w is any 
vector in R(T), then the fact that T is one-to-one means that there is exactly one vector v 
in V for which T (v) = w. This fact allows us to define a new function, called the inverse 
ofT (and denoted by T~ x ), that is defined on the range of T and that maps w back into 
v (Figure 8.2.4). 


► Figure 8.2.4 The inverse of T 
maps Tty) back into v. 


V 


v 



w = r(v) 
R(T) 


It can be proved (Exercise 33) that T 1 : R (T) — »■ V is a linear transformation. More- 
over, it follows from the definition of T~ l that 

= T _ 1 (w) = v (4) 


T(T~ 1 (w)) = T(v) = w (5) 

so that T and T~ l , when applied in succession in either order, cancel the effect of each 
other. 


► EXAMPLE 8 An Inverse Transformation 

We showed in Example 3 of this section that the linear transformation T:P n -> P n+ \ 
given by 

T(p) = T(p(x )) = xp(x) 

is one-to-one but not onto. The fact that it is not onto can be seen explicitly from the 
formula 

T(c 0 + ax H f c„x n ) = c 0 x + c\x 2 H V c n x' I+1 ( 6 ) 

which makes it clear that the range of T consists of all polynomials in P n+ \ that have 
zero constant term. Since T is one-to-one it has an inverse, and from ( 6 ) this inverse is 
given by the formula 

T~ l (cqx + c\x 2 + ■ ■ ■ + c n x n+1 ) = Co + c \x + ■ ■ ■ + c„x n 
For example, in the case where n > 3, 

T _1 (2x — x 2 + 5x 3 + 3x 4 ) — 2 — x + 5x 2 + 3x 3 

► EXAMPLE 9 An Inverse Transformation 

Let T : R 3 —> R 2 be the linear operator defined by the formula 

T(x\ , X 2 , X 3 ) = (3xi + X 2 , —2xi — 4 x 2 + 3 x 3 , 5xi + 4 x 2 — 2 x 3 ) 

Determine whether T is one-to-one; if so, find T _ 1 (xi, X 2 , X 3 ). 
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Solution It follows from Formula (15) of Section 1.8 that the standard matrix for T is 


m = 



i 

-4 

4 


0 

3 

-2 


(verify). This matrix is invertible, and from Formula (9) of Section 4.10 the standard 
matrix for T~ 1 is 


[T- 1 ] = [7T 1 = 


4 

-2 

-3" 

-11 

6 

9 

-12 

7 

10 


It follows that 


/ 

"xf 

\ 

~xf 


' 4 

-2 

-3" 


'xf 


4x! 

— 2x 2 — 3x3 

T -\ 

x 2 

1 = [T- 1 ] 

x 2 

= 

-11 

6 

9 


x 2 

= 

— 1 lxi 

+ 6x 2 + 9x 3 

V 


/ 

_X}_ 


.-12 

7 

10. 


.^3. 


— 12xi 

T 7x 2 -f IO.X3 


Expressing this result in horizontal notation yields 


T 1 (xi , x 2 , xf) = (4xi — 2x2 — 3x3, — llxi + 6x2 + 9x3, — 12xi + 7x2 + IOX3) 


Composition of 
One-to-One Linear 
Transformations 


We conclude this section with a theorem that shows that the composition of one-to- 
one linear transformations is one-to-one and that the inverse of a composition is the 
composition of the inverses in the reverse order. 


THEOREM 8,2, J If Tp.U —*V and T 2 '. V — > W are one-to-one linear transforma- 
tions , then'. 

(a) T 2 o T\ is one-to-one. 

(b) (T 2 o TO" 1 = Tf 1 o Tf 1 . 

Proof {a) We want to show that T 2 oT\ maps distinct vectors in U into distinct vectors 
in W. But if u and v are distinct vectors in U , then Ti(u) and Ti(v) are distinct vectors 
in V since T\ is one-to-one. This and the fact that T 2 is one-to-one imply that 

T 2 (Ti(u)) and T 2 (T,(v)) 

are also distinct vectors. But these expressions can also be written as 

(T 2 oT0(u) and (T 2 oT0(v) 
so T 2 o Ti maps u and v into distinct vectors in W. 

Proof (h) We want to show that 

(T 2 oT)-'(w) = (T,-' oT 2 -')(w) 
for every vector w in the range of T 2 oT\. For this purpose, let 

u= (T 2 oT,)-'(w) (7) 

so our goal is to show that 

u = (Tf 1 o Tf’fw) 

But it follows from (7) that 

(T 2 o Tf(u) = w 


or, equivalently, 


T 2 (Ti(u)) = w 
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Now, taking T 2 1 of each side of this equation, then taking T, 1 of each side of the result, 
and then using (4) yields (verify) 

u=T-'(T 2 '(»■)) 

or, equivalently, 

u = (Tf 1 o r-'xw) 

In words, part ( b ) of Theorem 8.2.4 states that the inverse of a composition is the com- 
position of the inverses in the reverse order. This result can be extended to compositions 
of three or more linear transformations; for example, 

(T 3 o T 2 o T\)~ l = Tf 1 o 77 1 o 77 1 (8) 


Note the order of the sub- 
scripts on the two sides of 
Formula (9). 


In the case where T A ,T B , and T c are matrix operators on R " , Formula (8) can be written 
as 

(T c o T b o T a )~ 1 = 77 1 o 77 1 o 77 1 

or alternatively as 


( Tcba ) 1 = T , i 1 sic 1 


(9) 


Exercise Set 8.2 


In Exercises 1-2, determine whether the linear transforma- 
tion is one-to-one by finding its kernel and then applying Theo- 
rem 8.2.1. 

1. (a) T : R 2 R 1 , where T(x, y) = (y, x) 

(b) T : R 2 -> R 3 , where T(x, y) = (x, y, x + y) 

(c) T: R 3 -> R 2 , where T(x, y, z) = (x + y + z, x — y — z) 


2. (a) T : R 2 — *■ R 3 , where T(x, y) = (x — y, y — x, 2x — 2y) 

(b) T :R 2 ^ R 2 , where T(x, y) = (0, 2jc + 3>>) 

(c) T : R 2 — > R 2 , where T(x, y) = (x + y, x — y) 


In Exercises -4, determine whether multiplication by A is 
one-to-one by computing the nullity of A and then applying The- 
orem 8.2.1. 


3. (a) A = 


(b) A = 




1 

2 

0 


7 

4 

0 


4. (a) A = 


1 

2 

3 

1 

0 

0 


2 

1 

9_ 

-3 

1 

0 


6 

2 

0 


f 

4 

1 


5. Use the given information to determine whether the linear 
transformation is one-to-one. 

(a) T:V^W; nullity (T) = 0 

(b) T:V^W; rank(T) = dim(V) 

(c) T-.V^W; dim(W) < dim(V) 

6. Use the given information to determine whether the linear 
operator is one-to-one, onto, both, or neither. 

(a) T: V — >■ V; nullity(T) = 0 

(b) T:V^V- rank(T) < dim(V) 

(c) T: T — >■ V; R(T) = V 

7. Show that the linear transformation T:P 2 ^>- R 2 defined by 
T(p(x)) = (p(— 1), p(l)) is not one-to-one by finding a 
nonzero polynomial that maps into 0 = (0, 0). Do you think 
that this transformation is onto? 

8. Show that the linear transformation T : P 2 > P 2 defined by 
T(p(x)) = p(x + 1) is one-to-one. Do you think that this 
transformation is onto? 

9. Let a be a fixed vector in R \ Does the formula T(\) = a x v 
define a one-to-one linear operator on R 3 ! Explain your rea- 
soning. 

10. Let E be a fixed 2x2 elementary matrix. Does the formula 
T(A) = EA define a one-to-one linear operator on M 2 {l Ex- 
plain your reasoning. 

In Exercises 11-12, compute (T 2 o T\)(x, y). 

11. Tfx, y) = (2x, 3y), T 2 (x, y) = (x - y, x + y) 


(b) A = 
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12. T { (x, y ) = (lx, —3y, x + y), 

T 2 (x, y, z) = (x - y, y + z) 

In Exercises , compute (T 3 o T 2 o T[)(x, y). 

13. T\(x, y) — (- 2y , 3x, x - 2y), T 2 {x, y, z) = (y, z, x), 
T 3 (x, y, z) = (x + z, y - z) 


(a) Show that 7j and T 2 are one-to-one. 

(b) Find formulas for 

Tp\x,y), T 2 \x,y), (T 2 o TO~\x, y) 

(c) Verify that (T 2 o 7)) -1 = 7j _1 o T 2 l . 


14. Ti(x,y) = (x + y,y, -x), 

T 2 (x, y, z) = (0, x + y + z, 3v), 
T 3 (x, y, z) = (3x + 2y, 4z - x — 3 y) 


15. Let 7j : M 22 -» R and T 2 : M 22 -> M 22 be the linear transforma- 
tions given by 7j(A ) = tr(A) and T 2 (A) = A r . 


(a) Find (7j o T 2 )(A), where A = 


b 

d 


23. Let T\.P 2 ~* P 3 and T 2 .P 3 ^> P 3 be the linear transformations 
given by the formulas 

T\ (, p(x )) = xp(x) and T 2 (p(x)) = p(jv + 1) 

(a) Find formulas for Tf 1 ( p(x )), T 2 l (p(x)), and 
(Tf 1 o TP 1 ) (p(x)). 

(b) Verify that (T 2 o T,r' = Jf 1 o Tp\ 


(b) Can you find ( T 2 o T\)(A)2 Explain. 

16. Rework Exercise 1 5 given that 7j : M 22 — > M 22 and 
T 2 :M 22 ^>- M 22 are the linear transformations, Ti(A) = kA 
and T 2 (A) = A T , where k is a scalar. 

17. Suppose that the linear transformations Ti : P 2 ► P 2 and 
T 2 :P 2 -> P 3 are given by the formulas 7i(p(;t)) = p(x + 1) 
and T 2 (p(x)) — xp(x). Find (T 2 o Ti)(ao + ci\X + a 2 x 2 ). 

18. Let T\.P n -¥ P n and T 2 : P n -*■ P„ be the linear operators given 
by Ti(p(x)) = p(x — 1) and T 2 (p(x)) = p(x + 1). Find 
(T o T 2 )(p(x)) and (T 2 o 7i)(p(x)). 

19. Let T : Pi — >■ R 2 be the function defined by the formula 

T(p(x)) = (p(0), p(l)) 


24. Let T a \R 2 ^R\ T b : R 3 R 3 , and T c : R 3 R 3 be the re- 
flections about the jcy-plane, the xz-plane, and the yz-plane, 
respectively. Verify Formula (9) for these linear operators. 


25. Let Tp. V — > V be the dilation Ti(v) = 4v. Find a linear oper- 
ator Tp.V^V such that T\ o T 2 = / and T 2 o 7\ = /. 

26. Let T\'.M 22 — > Pi and T 2 :Pi—>R 2 be the linear transforma- 
a b 


tions given by J] 


= (a + b) + (c + d)x and 


c d 

T 2 (a + bx) — (a, b, a). 

fa) Find the formula for T 2 o 7\ . 

(b) Show that T 2 o T\ is not one-to-one by finding distinct 
2x2 matrices A and B such that 


(a) Find T{ 1 - 2x). 

(b) Show that T is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Find T _1 (2, 3), and sketch its graph. 

20. In each part, determine whether the linear operator 

T : R" —*■ R" is one-to-one; if so, find T~ l (x\, x 2 , . . . , x„). 

(a) T(x u x 2 , . . . , x n ) = (0, x u x 2 , x„_i) 

(b) T(xi, x 2 , x n ) = (x n ,x n -u ...,x 2 , Xl) 

(c) T(x\, X 2 , X n ) = (x 2 , X 3 ,..., X n , X, ) 

21. Let T : R" —> R " be the linear operator defined by the formula 

T(x u x 2 , . . . , x n ) = (i a 3 X \ , a 2 x 2 , . . . , a n x n ) 

where cti, ... ,a n are constants. 

(a) Under what conditions will T have an inverse? 

(b) Assuming that the conditions determined in part (a) are 
satisfied, find a formula for T~ l (xi, x 2 , . . . , x n ). 

22. Let 7) : R 2 — > R 2 and T 2 : R 2 — > R 2 be the linear operators given 
by the formulas 

Tpx, y) = (x + y,x — y) and T 2 (x, y) = (2* + y, x — 2 y) 


(T 2 oT l )(A) = (T 2 oT l )(B) 

(c) Show that T 2 o T\ is not onto by finding a vector (a, b, c) 
in R 1 that is not in the range of T 2 o T t . 

27. Let T : R 3 — > R J be the orthogonal projection of R 3 onto the 
xy-plane. Show that T o T = T . 

28. ( Calculus required) Let V be the vector space C'[0, 1] and let 
T : V — >■ R be defined by 

T(f) = f( 0) + 2/'(0) + 3/'(l) 

Verify that T is a linear transformation. Determine whether 
T is one-to-one, and justify your conclusion. 

29. ( Calculus required) The Fundamental Theorem of Calculus 
implies that integration and differentiation reverse the ac- 
tions of each other. Define a transformation D.P n -±P n _\ 
by D(p(x)) = p'(x), and define J: P n _\ —> P„ by 

J(p(x))= f pit) dt 
Jo 

(a) Show that D and J are linear transformations. 

(b) Explain why J is not the inverse transformation of D. 

(c) Can the domains and/or codomains of D and J be re- 
stricted so they are inverse linear transformations? 
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30. (Calculus required ) Let 


D(f) = f'(x) and 7(f) = f f(t)dt 

Jo 

be the linear transformations in Examples 11 and 12 of Sec- 
tion 8.1. Find (7 o D)(i) for 

(a) f(x) = x 2 + 3x + 2. (b) f(x) = sinx. 


36. Prove: If there exists an onto linear transformation T : V — > W 
then dim(F) > dim(lV). 

37. Prove: If V and W are finite-dimensional vector spaces such 
that dim(VF) < dim(L), then there is no one-to-one linear 
transformation T:V->W. 


31. ( Calculus required) Let 7 : P\ — »■ R be the integration transfor- 
mation 7 Cp) = /j*j p(x)dx. Determine whether 7 is one-to- 
one. Justify your answer. 

32. ( Calculus required) Let D: P n — > P n _ i be the differentiation 
transformation D(p(x)) = p'( x). Determine whether D is 
onto, and justify your answer. 

Working with Proofs 

33. Prove: If T : V — > W is a one-to-one linear transformation, 
then T~ l : R(T) -*■ V is a one-to-one linear transformation. 

34. Use the definition of T 2 oT 2 o 7) given by Formula (3) to 
prove that 

(a) 7) o 72 o 7) is a linear transformation. 

(b) 7) o 7) ° 7) = (T 3 o T 2 ) o T\. 

(c) T } o T 2 o 7j = r 3 o (T 2 o Ti). 

35. Let qo (x ) be a fixed polynomial of degree m , and define a func- 
tion T with domain P n by the formula T(p(x)) = p(qo(x)). 
Prove that T is a linear transformation. 


True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 

(a) The composition of two linear transformations is also a linear 
transformation. 

(b) If 7) : V — > V and T 2 : V — > V are any two linear operators, then 
7j o T 2 = T 2 o 7j . 

(c) The inverse of a one-to-one linear transformation is a linear 
transformation. 

(d) If a linear transformation T has an inverse, then the kernel of 
T is the zero subspace. 

(e) If T : R 2 R 2 is the orthogonal projection onto the x-axis, 
then 7’~ 1 : R 2 — > R 2 maps each point on the x-axis onto a line 
that is perpendicular to the x-axis. 

(f ) If 7) : U — > V and T 2 :V ->-W are linear transformations, and 
if T\ is not one-to-one, then neither is T 2 o T t . 


8.3 Isomorphism 

In this section we will establish a fundamental connection between real finite-dimensional 
vector spaces and the Euclidean space R n . This connection is not only important 
theoretically, but it has practical applications in that is allows us to perform vector 
computations in general vector spaces by working with the vectors in R n . 

Isomorphism Although many of the theorems in this text have been concerned exclusively with the 
vector space R n , this is not as limiting as it might seem. We will show that the vector 
space R" is the "mother” of all real n -dimensional vector spaces in the sense that every 
n -dimensional vector space must have the same algebraic structure as R n even though 
its vectors may not be expressed as n -tuples. To explain what we mean by this, we will 
need the following definition. 


DEFINITION 1 A linear transformation T : V -> W that is both one-to-one and onto 
is said to be an isomorphism , and W is said to be isomorphic to V. 


In the exercises we will ask you to show that if T : V —>■ W is an isomorphism, then 
T” 1 : W — > V is also an isomorphism. Accordingly, we will usually say simply that V and 
W are isomorphic and that T is an isomorphism between V and W . 

The word isomorphic is derived from the Greek words iso, meaning “identical,” and 
morphe, meaning “form.” This terminology is appropriate because, as we will now 
explain, isomorphic vector spaces have the same “algebraic form,” even though they 
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may consist of different kinds of objects. For example, the following diagram illustrates 
an isomorphism between P 2 and R 3 

2 T ^ 

Co + C\X + C 2 X < - ' T Li - - \ c o,C\,C 2 ) 

Although the vectors on the two sides of the arrows are different kinds of objects, the 
vector operations on each side mirror those on the other side. For example, for scalar 
multiplication we have 

1 T 

k(c 0 + C\X + c 2 x ) < -■ w - > k(c 0 , ci , c 2 ) 

kco + kc\x + kc 2 x 2 (kc(), kc\, kc 2 ) 

and for vector addition we have 

(c 0 + c\x + c 2 x 2 ) + (do + d\x + cl 2 x 2 ) < - - > (c 0 , Ci, c 2 ) + (do, d\, d 2 ) 

T 

(co + do) + (c 1 + d\)x + (c 2 + d 2 )x~ < - ■ - - (co + do, Ci + di, c 2 + d 2 ) 

The following theorem, which is one of the most basic results in linear algebra, reveals 
the fundamental importance of the vector space R" . 


IEM 8.3.1 Every real n -dimensional vector space is isomorphic to R". 


Theorem 8.3.1 tells us that ev- 
ery real n -dimensional vector 
space differs from R n only in 
notation; the algebraic struc- 
tures of the two spaces are the 
same. 


Proof Let V be a real n -dimensional vector space. To prove that V is isomorphic to R" 
we must find a linear transformation T :V — > R" that is one-to-one and onto. For this 
purpose, let 

S = {vi, v 2 , . . . , v„} 


be any basis for V, let 

u = k\\\ + k 2 \ 2 H f k n y n (1) 

be the representation of a vector u in V as a linear combination of the basis vectors, and 
let T : V — > R" be the coordinate map 


T( u) = (u)j = (k\, k 2 , . . . , k n ) 


( 2 ) 


We will show that T is an isomorphism (linear, one-to-one, and onto). To prove the 
linearity, let u and v be vectors in V, let c be a scalar, and let 


u = k jvi + k 2 \ 2 H b k n \„ and v = d\\\ + d 2 \ 2 H b d n \„ (3) 

be the representations of u and v as linear combinations of the basis vectors. Then it 
follows from (3) that 

T(cu) = nckm + ck 2 x 2 H b ck„x n ) 

= (ck\, ck 2 , . . . , ck n ) 

= c{k \ , k 2 , . . . , kn) = cT( u) 

and that 

T(u + v) = T[(k\ + di)vi + { k 2 + d 2 )y 2 + • • • + ( k n + d„)v„) 

= (^1 + di, k 2 + d 2 , . . . , k n + d „ ) 

= (k\ , k 2 , . . . , k n ) + {d \ , d 2 , . . . , d n ) 

= T( u) + T(v) 

which shows that T is linear. To show that T is one-to-one, we must show that if u 
and v are distinct vectors in V, then so are their images in R' 1 . But if u/v, and if the 
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representations of these vectors in terms of the basis vectors are as in (3), then we must 
have ki ^ r/,- for at least one i . Thus, 

T( u) = (ki, k 2 , . . . , k n ) (d u d 2 , . . . , d n ) = T(y) 

which shows that u and v have distinct images under T. Finally, the transformation T is 
onto, for if 

w = (ki, k 2 , . . . , k n ) 

is any vector in R", then it follows from (2) that w is the image under T of the vector 

u = k\\\ T k 2 \2 H b k n y n 


Whereas Theorem 8.3.1 tells us, in general, that every n -dimensional vector space is 
isomorphic to R", it is Formula (2) in its proof that tells us how to find isomorphisms. 


THEOREM 8.3.2 

dinate map 


If S — { Vi , V 2 , . . . , v„ } is a basis for a vector space V , then the coor- 



is an isomorphism between V and R n . 


Remark Recall that coordinate maps depend on the order in which the basis vectors are listed. 
Thus, Theorem 8.3.2 actually describes many possible isomorphisms, one for each of the n ! possible 
orders in which the basis vectors can be listed. 


► EXAMPLE 1 The Natural Isomorphism Between P„_i and R n 

It follows from Theorem 8.3.2 that the coordinate map 

flo + a\x + ■ • ■ + a n - ix" -1 — («o, a\, . . . , a„_i) 

defines an isomorphism between P„-\ and R" . This is called natural isomorphism between 
those vector spaces. 

EXAMPLE 2 The Natural Isomorphism Between M 22 and R 4 

It follows from Theorem 8.3.2 that the coordinate map 

a b 
c d 

defines an isomorphism between M 22 and R 4 . This is a special case of the isomorphism 
that maps an m x n matrix into its coordinate vector. We call this the natural isomorphism 
between M mn and R"" 1 . 


(a, b, c, cl) 


CALCULUS REQUIRED 


^ EXAMPLE 3 Differentiation by Matrix Multiplication 

Consider the differentiation transformation D : P 3 — > P 2 on the vector space of poly- 
nomials of degree 3 or less. If we map P 2 and P 2 into R 4 and if , respectively, by 
the natural isomorphisms, then the transformation D produces a corresponding matrix 
transformation from R 4 to R \ Specifically, the derivative transformation 


flo + ci\x + a 2 x + a 2 x 


a\ + 2a 2 x + 3a 2 x 2 


D 
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Inner Product Space 
Isomorphisms 


produces the matrix transformation 

"0 1 0 0“ 

0 0 2 0 
0 0 0 3 

Thus, for example, the derivative 

d , 

— (2 + x + 4x z — 
dx 

can be calculated as the matrix product 

"0 1 0 o" 

0 0 2 0 
0 0 0 3 

This idea is useful for constructing numerical algorithms to calculate derivatives. 

► EXAMPLE 4 Working with Isomorphisms 

Use the natural isomorphism between P 5 and R 6 to determine whether the following 
polynomials are linearly independent. 

P[ = 1 + 2x — 3x 2 + 4x 3 + x 5 

p 2 = 1 + 3x — 4x 2 + 6x 3 + 5x 4 + 4x 5 

p 3 = 3 + 8x - 1 lx 2 - 16x 3 + 10x 4 + 9x 5 

Solution We will convert this to a matrix problem by creating a matrix whose rows 
are the coordinate vectors of the polynomials under the natural isomorphism and then 
determine whether those rows are linearly independent using elementary row operations. 
The matrix whose rows are the coordinate vectors of the polynomials under the natural 
isomorphism is 



"1 

2 -3 

4 

0 

f 


A = 

1 

3 -4 

6 

5 

4 



3 

8 -11 

16 

10 

9 


We leave it for you to use elementary row operations to 

reduce this matrix to the row 

echelon form 

"1 

2 -3 

4 

0 

f 


R = 

0 

1 -1 

2 

5 

3 



0 

0 0 

0 

0 

0 


This matrix has only two nonzero rows, so the row space of A is two-dimensional, which 


means that its row vectors are linearly dependent. Hence so are the given polynomi- 
als. M 

In the case where V is a real n -dimensional inner product space, both V and R" have, in 
addition to their algebraic structure, a geometric structure arising from their respective 
inner products. Thus, it is reasonable to inquire if there exists an isomorphism from V to 
R n that preserves the geometric structure as well as the algebraic structure. For example, 
we would want orthogonal vectors in V to have orthogonal counterparts in R", and we 
would want orthonormal sets in V to correspond to orthonormal sets in R n . 

In order for an isomorphism to preserve geometric structure, it obviously has to 
preserve inner products, since notions of length, angle, and orthogonality are all based 


00 


r 




CL\ 

— 

2a 2 

d2 


3a 3 

#3 




x 3 ) = 1 + 8x — 3x 2 


2 



1 


1 

— 

8 

4 



-1 


-3 
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on the inner product. Thus, if V and W are inner product spaces, then we call an 
isomorphism T : V — > W an inner product space isomorphism if 

(T( u), T(v)) = (u, v) for all u and v in V 

The following analog of Theorem 8.3.2 provides an important method for obtaining 
inner product space isomorphisms between real inner product spaces and Euclidean 
vector spaces. 


!EM 8.3.3 If S = { v ! , V 2 , . . . , v„ } is an ordered orthonormal basis for a real 
vector space V, then the coordinate map 


u 



(u)s 


is an inner product space isomorphism between V and the vector space R” with the 
Euclidean inner product. 


► EXAMPLE 5 An Inner Product Space Isomorphism 

We saw in Example 1 that the coordinate map 

a 0 + a\x H b (a 0 , a„_i) 

with respect to the standard basis for P„_i is an isomorphism between P„_i and R" . 
Elowever, the standard basis is orthonormal with respect to the standard inner product 
on P„_ i (see Example 3 of Section 6.3), so it follows that T is actually an inner product 
space isomorphism with respect to the standard inner product on P„_i and the Euclidean 
inner product on R" . To verify that this is so, recall from Example 7 of Section 6.1 that 
the standard inner product on P„_i of two vectors 

p = flo + u\X + • • • + and q = bo + b\x + ■ ■ ■ + £>„_ ix" _1 


is 


(p, q) = a 0 b 0 + a\bi H b 


But this is exactly the Euclidean inner product on R n of the n -tuples 


(a 0 , at,..., a n -i) and (b 0 ,bu i) 


► EXAMPLE 6 A Notational Matter 

Let R" be the vector space of real n -tuples in comma-delimited form, let M„ be the vector 
space of real nxl matrices, let R n have the Euclidean inner product (u, v} = u • v, and 
let M„ have the inner product (u, v) = u r v in which u and v are expressed in column 
form. The mapping T : R" — > M n defined by 

Vl~ 

Vo 


is an inner product space isomorphism, so the distinction between the inner product 
space R n and the inner product space M n is essentially notational, a fact that we have 
used many times in this text. 


(fl, v 2 , 


v n ) 
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Exercise Set 8.3 


In Exercises 1-8, state whether the transformation is an iso- 
morphism. No proof required. 

1. c 0 + C\X — ^ (c 0 — Ci, Ci) from P\ to R 2 . 

2. ( x , y) — ► ( x , y, 0) from R 2 to R 3 . 


3. a + bx + cx 2 + dx 3 


b 

d 


from P-i to Vf 22 . 


4. 


a b 
c d 


•ad — be from M 2 2 to R. 


5. (a, b, c, d) — » a + bx + cx 2 + (d + \)x 3 from R 4 to P } . 

6. A — > A T from M m , to 


7. Ci sin x + Ci cos x — > (ci , c 2 ) from the subspace of C (— 00 , 00 ) 
spanned by S — (sin x , cos x j to R 2 . 

8. The map (Mi, u 2 , . . . , u n , ...)—»■ (0, u 1 , m 2 u„, . . .) from 

R“ to R°°. 


9. (a) Find an isomorphism between the vector space of all 3 x 3 
symmetric matrices and R 6 . 

(b) Find two different isomorphisms between the vector space 
of all 2 x 2 matrices and R 4 . 


10. (a) Find an isomorphism between the vector space of all poly- 
nomials of degree at most 3 such that p( 0) = 0 and R 3 . 

(b) Find an isomorphism between the vector spaces 
spanfl, sin(x), cos(x)) and R 3 . 


a + b 
a + b 
a + b + c 
a 4- b + c 4- d_ 

17. Do you think that R 2 is isomorphic to the xv-plane in R 3 ! 
Justify your answer. 

18. (a) For what value or values of k, if any, is M mn isomorphic 

to R k l 

(b) For what value or values of k, if any, is M mn isomorphic 
to P k l 


16. 


19. Let T:P 2 ^>- A/ 22 be the mapping 


T (p) = T(p(x)) = 


P(0) 
P( 1) 


P( 1) 
p(0) 


Is this an isomorphism? Justify your answer. 


20. Show that if M 22 and P } have the standard inner products 
given in Examples 6 and 7 of Section 6.1, then the mapping 


cio 

02 


a 1 

0) 


• Oq T a\X T rt 2 x~ T ci^x 


is an inner product space isomorphism between those spaces. 


21. ( Calculus required ) Devise a method for using matrix mul- 
tiplication to differentiate functions in the vector space 
spanfl, sin(.v), cos(tf), sin(2x), cos(2x)J. Use your method 
to find the derivative of 3 — 4 sin(jc) + sin(2x) + 5 cos(2jc). 


In Exercises -12, determine whether the matrix transforma- 
tion Ta'.R 3 —*■ R 3 is an isomorphism. 



0 

1 

-1" 


1 

-1 

o' 

11. A = 

1 

0 

2 

II 

(A 

0 

0 

2 


-1 

1 

0 


-1 

1 

0 


In Exercises , find the dimension n of the solution space 
IV of Ax = 0, and then construct an isomorphism between IV and 
R n . 







‘1 

0 

1 

0" 

'1 

1 

1 

r 


1 

0 

1 

0 

2 

2 

2 

2 

II 

-*T 

0 

1 

0 

1 

3 

3 

3 

3_ 











.0 

1 

0 

1_ 


In Exercises > -16, determine whether the transformation is 
an isomorphism from M 22 to R 4 . 

a 

a + b 
a + b + c 
a J- b J- c -J- d _ 


Working with Proofs 

22. Prove that if T : V — >■ IV is an isomorphism, then so is 
T-':W^V. 

23. Prove that if 17, V, and W are vector spaces such that U is iso- 
morphic to V and V is isomorphic to IV, then U is isomorphic 
to IV. 

24. Use the result in Exercise 22 to prove that any two real finite- 
dimensional vector spaces with the same dimension are iso- 
morphic to one another. 

25. Prove that an inner product space isomorphism preserves an- 
gles and distances — that is, the angle between u and v in 
V is equal to the angle between T( u) and 7\v) in VP, and 

l|u-v||v = ||r(u)-riv)||*. 

26. Prove that an inner product space isomorphism maps ortho- 
normal sets into orthonormal sets. 

True-False Exercises 

TF. In parts (a)-(f) determine whether the statement is true or 

false, and justify your answer. 


15 . 
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(a) The vector spaces R 2 and P 2 are isomorphic. 

(b) If the kernel of a linear transformation T: P 2 — > P 3 is (0|, then 
T is an isomorphism. 

(c) Every linear transformation from M 33 to P 9 is an isomorphism. 


(d) There is a subspace of M 23 that is isomorphic to R 4 . 

(e) Isomorphic finite-dimensional vector spaces must have the 
same number of basis vectors. 

(f ) R " is isomorphic to a subspace of R n+l . 


8.4 Matrices for General LinearTransformations 

In this section we will show that a general linear transformation from any n -dimensional 
vector space V to any m-dimensional vector space W can be performed using an 
appropriate matrix transformation from R n to R m . This idea is used in computer 
computations since computers are well suited for performing matrix computations. 

Matrices of Linear Suppose that V is an n-dimensional vector space, that W is an m -dimensional vector 
Transformations space, and that 7': V — > W is a linear transformation. Suppose further that B is a basis 
for V, that B' is a basis for W, and that for each vector x in V, the coordinate matrices 
for x and T(x) are [x] B and |T(x)] B /, respectively (Figure 8.4.1). 


► Figure 8.4.1 


A vector 
in V 

(77-dimensional) 




A vector 
ini?" 




T(x) 


c 


A vector 
in W 

(w-dimensional) 


[TMV 


A vector 
in R m 


It will be our goal to find an m x n matrix A such that multiplication by A maps 
the vector [x] s into the vector (T(x)] B / for each x in V (Figure 8.4.2a). If we can do so, 
then, as illustrated in Figure 8.4.2/t, we will be able to execute the linear transformation 
T by using matrix multiplication and the following indirect procedure: 


Finding T(x) Indirectly 

Step 1. Compute the coordinate vector [x] B . 

Step 2. Multiply [x] B on the left by A to produce [7’(x)] B -. 
Step 3. Reconstruct 7'( x ) from its coordinate vector |T(x)] B <. 


T maps 
Kinto W 



x 1 ► 7(x) 

[x] B ► [TWly 


( 1 ) 


Direct 

computation 


Multiply by A 
(2) 


Multiplication by A maps R n into R n ‘ 


(b) 


T(x) 

(3) 

[T(x)] s . 


► Figure 8.4.2 


(a) 
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The key to executing this plan is to find an m x n matrix A with the property that 

A[x] b = [T(x)] b , (1) 

For this purpose, let B = {ui, U 2 , . . . , u„} be a basis for the n -dimensional space V and 
B’ — {vi, v 2 , . . . , y m } a basis for the m-dimensional space W . Since Equation (1) must 
hold for all vectors in V, it must hold, in particular, for the basis vectors in B\ that is, 

A[ui] b = [r(u!)] B -, A[u 2 ]b = [7’(u 2 )]s', . . . , A[u„]s = [7 ’(u,,)]b' (2) 


But 


so 



T 


"o" 


"o" 


0 


1 


0 

[Ullfl = 

0 

7 [ u 2]b — 

0 

,•••, [u„]b = 

0 


0 


0 


1 


1 



All 

an 

CL\n 


0 

0 


a\\ 

A[U!] B = 

a 2 i 

a 22 

n 


= 

«21 


_fm 1 

®m2 

& mn_ 


0 


_Q-m 1 _ 


A[u 2 ]b = 

an ai 2 

a 2 i a 22 

ft\ n 

ftln 


0 

l 

0 

— 

ai2 

an 


_a m i a m2 

ftmn_ 


0 





an 

fll2 

ft\ n 


0 

0 

0 


ft\ n 

A[u„] B = 

a 2 1 

a 2 2 

ftln 


= 

ft2n 


_tt m \ 

ttm2 

ftmn_ 


1 


_ftmn_ 


Substituting these results into (2) yields 


an 


an 


ft\ n 

«21 

= [r(ui)] B s 

«22 

= [7Tu2)]/f', . . . , 

ft2n 

ttml 


ttm2 


ftmn 


which shows that the successive columns of A are the coordinate vectors of 


T( Ul ), T(u 2 ), . . . , T(u n ) 

with respect to the basis B'. Thus, the matrix A that completes the link in Figure 8.4.2a is 

A = [[TTuOb- | [T(u 2 )] B / | ••• | [T(u„)]b] (3) 

We will call this the matrix for T relative to the bases B anil B’ and will denote it by the 
symbol [T] B '.b- Using this notation, Formula (3) can be written as 


[T] B ',b = [[7Xui)]/r I [r(u 2 )]ir I ■■■ I [T(u„)] B <] 


(4) 
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Basis for the Basis for the 
image space domain 

▲ Figure 8.4.3 

= [t(x)] b . 

ft, 

Cancellation 


▲ Figure 8.4.4 


and from ( 1 ), this matrix has the property 


[fli'jWs = [r(x)] B / ( 5 ) 

We leave it as an exercise to show that in the special case where Tq'. R" R m is multi- 
plication by C, and where B and B' are the standard bases for R n and R m , respectively, 
then 

[T c ] B ',b = C ( 6 ) 


Remark Observe that in the notation [T} B : B the right subscript is a basis for the domain of T, 
and the left subscript is a basis for the image space of T (Figure 8.4.3). Moreover, observe how 
the subscript B seems to “cancel out” in Formula (5) (Figure 8.4.4). 


► EXAMPLE 1 Matrix for a LinearTransformation 

Let T : P\ — > P2 be the linear transformation defined by 

T(p(x)) = xp(x) 

Find the matrix for T with respect to the standard bases 

B = {ui , 112} and B' — {vi, v 2 , ^3} 


where 


U[ = 1, u 2 = x; vi = 1 , v 2 = x, 


v 3 


Solution From the given formula for T we obtain 

r( Ul ) = 7X1) = (x)(i) = * 

7 ( u 2 ) = T(x) = (x)(x) = X 2 

By inspection, the coordinate vectors for T(ui) and 7 Tu 2 ) relative to B' are 



"0" 


"0” 

[r( Ul )] B - = 

1 

> [r(u 2 )]a' = 

0 


0 


1 


Thus, the matrix for T with respect to B and B' is 

[T]b',b = [[T(ui)]b' I [T(u 2 )]/j'] 


0 

0 

1 


► EXAMPLE 2 The Three-Step Procedure 

Let T: P\->P 2 be the linear transformation in Example 1 , and use the three-step pro- 
cedure described in the following figure to perform the computation 

T(a + bx) = x(a + bx) = ax + bx 2 


Direct 

computation 


T(x) 


0) (3) 

Multiply by [T]^ B 
Ms 
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Although Example 2 is sim- 
ple, the procedure that it illus- 
trates is applicable to problems 
of great complexity. 


Solution 

The coordinate matrix for x = a + bx relative to the basis B = {1, x} is 

a 

t x] fl = , 


Step 2. Multiplying [x] B by the matrix [T] B '.b found in Example 1 we obtain 


[7’]b',b[ x ]b — 


0 0 
1 0 
0 1 


= mx)h 


Step 3. Reconstructing T(x) = T(a + bx) from |T(x)] B < we obtain 
T(a + bx) — 0 + ax + bx 2 = ax + bx 2 


► EXAMPLE 3 Matrix for a LinearTransformation 

Let T : R 2 —> R 3 be the linear transformation defined by 



*2 


0 

l" 

—5x 

+ 13x2 

= 

-5 

13 

—lx 

T 1 6x2 


-7 

16 


Find the matrix for the transformation T with respect to the bases B = {ui, 112} for R 2 
and B' — {vi, V2, V3} for R? , where 


u, = 


u 2 = 


Vi 


I 

0 

-1 


v 2 = 


v 3 


Solution From the formula for T, 



1" 


2" 

T( m) = 

-2 

, T( u 2 ) = 

1 


-5 


-3 


Expressing these vectors as linear combinations of vi, V2, and v 3 , we obtain (verify) 
T(ui) = Vi — 2v 3 , r(u 2 ) = 3vi + v 2 — v 3 

Thus, 



1 


3 

m ui)] B ' = 

0 

> [T’(U2)]s' = 

1 


-2 


-1 

so 





[T] B ',b = [[T( Ul )] B - 1 [r(u 2 )] B -] = 


1 3 

0 1 

-2 -1 


Remark Example 3 illustrates that a fixed linear transformation generally has multiple represen- 
tations, each depending on the bases chosen. In this case the matrices 



0 

f 


1 

3" 

m = 

-5 

13 

and [T]^, B = 

0 

1 


-7 

16 


-2 

-1 


both represent the transformation T, the first relative to the standard bases for R 2 and R 3 , the 
second relative to the bases B and B' stated in the example. 
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Matrices of Linear 
Operators 


Phrased informally, Formulas 
(7) and (8) state that the ma- 
trix for T, when multiplied by 
the coordinate vector forx, pro- 
duces the coordinate vector for 
T(x). 


In the special case where V = W (so that T : V — > V is a linear operator), it is usual to 
take B = B' when constructing a matrix for T . In this case the resulting matrix is called 
the matrix for T relative to the basis B and is usually denoted by [T] B rather than [T] BB . 
If B = {ui, U 2 , . . . , u,,}, then Formulas (4) and (5) become 

[T] b = [[T(u,)] B | [7Tu 2 )]b | ••• | [7\u„)b] (7) 

[r] B [x] B = [T(x)] b (8) 

In the special case where T: R n — ► R" is a matrix operator, say multiplication by A, and 
B is the standard basis for R' 1 , then Formula (7) simplifies to 


[71 it = A 


(9) 


Matrices of Identity 
Operators 


Recall that the identity operator I : V — > V maps every vector in V into itself, that is, 
I (x) = x for every vector x in V. The following example shows that if V is n -dimensional, 
then the matrix for I relative to any basis B for V is the n x n identity matrix. 


EXAMPLE 4 Matrices of Identity Operators 

If B = {ui, U 2 , . . . , u„} is a basis for a finite-dimensional vector space V, and if I.V^-V 
is the identity operator on V, then 


Therefore, 


7(ui) = ui, /( u 2 )=u 2 , I (u„) = u, 


[I]b = 


1 

0 

0 


0 

1 

0 


0 

0 

0 


= I 


0 0 ••• 1 

t I f 

[/(Ui)]iJ |/(u 2 )]jj [/(u„)] a 


► EXAMPLES Linear Operator on P 2 

Let T : P 2 Pi be the linear operator defined by 

T(p(x)) = p(3x - 5) 

that is, T(cq + c\x + c 2 x 2 ) = cq + ci(3x — 5) + c 2 (3x — 5) 2 . 

(a) Find [T] B relative to the basis B — {1, x, x 2 }. 

(b) Use the indirect procedure to compute T(\ + 2x + 3x 2 ). 

(c) Check the result in (b) by computing T{ 1 + 2x + 3x 2 ) directly. 

Solution (a) From the formula for T, 

T( 1) = 1, T(x) = 3x - 5, T(x 2 ) = (3x — 5) 2 = 9x 2 — 30x + 25 


"f 


"-5" 


25" 

0 

0 

, [T(x)] b = 

3 

0 

, [T{x 2 )] b = 

-30 

9 


[r(i)] B = 
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Thus, 


[T]b = 


1 -5 
0 3 
0 0 


25 

-30 

9 


Solution (h) 

Step 1 The coordinate matrix for p = 1 + 2a + 3x 2 relative to the basis B — {1, a, a 2 
is 

T 

[pL> = 2 
3 

Multiplying [p] B by the matrix [ T] B found in part (a) we obtain 


[T] s [p] B 


I -5 25 

0 3 -30 

0 0 9 



T 


66” 


2 

= 

-84 


3 


27 


= [7X p)]i 


Step 3. Reconstructing T(p) = T(1 + 2x + 3a 2 ) from [T(p)] s we obtain 
T(l + 2x + 3x 2 ) = 66 - 84a + 27a 2 
Solution (c) By direct computation, 

7’(1 + 2a + 3a 2 ) = 1 + 2(3a - 5) + 3(3a - 5) 2 

= 1 + 6a - 10 + 21 x 2 - 90a + 75 
= 66 — 84a + 27a 2 

which agrees with the result in (b). 

Matrices of Compositions We will conclude this section by mentioning two theorems without proof that are gen- 
and Inverse eralizations of Formulas (4) and (9) of Section 4. 10. 

Transformations 

THEOREM 8.4.1 If T \ : U — > V and Ty.V — > W are linear transformations , and if B , 
B" , and B' are bases for U, V, and W, respectively , then 


[Ti o T\]b',b = [T2\b , ,b"[T\]b".b 


( 10 ) 


IfT : V — > V is a linear operator, and if B is a basis for V, then the 
following are equivalent. 

(a) T is one-to-one. 

(b) [T]b is invertible. 

Moreover, when these equivalent conditions hold, 

[T- 1 ]b = [T]~ 1 (11) 


[ r 2° T \^B',B 

t t 

Cancellation' 


Remark In (10), observe how the interior subscript B" (the basis for the intermediate space V) 
seems to “cancel out,” leaving only the bases for the domain and image space of the composition 
as subscripts (Figure 8.4.5). This “cancellation” of interior subscripts suggests the following 
extension of Formula (10) to compositions of three linear transformations (Figure 8.4.6): 


[T 3 o 7) o T{\b',b = \Ti}b' ,b"'[T2\b"' ,b"[T{\b" ,b 


(12) 


▲ Figure 8.4.5 
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Basis B Basis B" Basis B" Basis B' 


A Figure 8.4.6 

The following example illustrates Theorem 8.4.1. 


► EXAMPLE 6 Composition 

Let T\ \ P\ -» P 2 be the linear transformation defined by 

T\(p(x)) = xp(x ) 
and let Ti. Pi-* Pi be the linear operator defined by 

Ti(p(x)) = p(3x - 5) 

Then the composition (T 2 o 7)): Pi —> Pi is given by 

(Pi o TMp(x)) = T 2 (T\(p(x))) = Tiixpix)) = (3x - 5)pOx - 5) 

Thus, if p(x) = co + c\x, then 

(Pi o Ti)(c 0 + cix) = Ox - 5)(c 0 + ci(3x - 5)) 

= c 0 (3x — 5) + ci(3x — 5) 2 (13) 

In this example, P\ plays the role of U in Theorem 8.4.1, and P 2 plays the roles of both 
V and W; thus we can take B’ — B" in (10) so that the formula simplifies to 


[Ti 0 T\\b',b = [TiIb'YPi^b'.b 


(14) 


Let us choose B = { 1 , x ) to be the basis for Pi and choose B’ = { 1 , x , x 2 } to be the basis 
for P 2 . We showed in Examples 1 and 5 that 


"0 

0" 


‘l 

-5 

25” 

1 

0 

and [T 2 \b' = 

0 

3 

-30 

0 

1 


0 

0 

9 


Thus, it follows from (14) that 



"l -5 25" 


O 

O | 


‘-5 25” 

[Ti 0 Ti]b',b = 

0 3 -30 


1 0 

= 

3 -30 


0 0 9 


0 1 


0 9 


(15) 


As a check, we will calculate [P 2 o P\]b',b directly from Formula (4). Since B = {1, x), 
it follows from Formula (4) with ui = 1 and u 2 = x that 


[Pi o Ti]b',b = [[(Ti o 7 i)( 1)] b - I [(T 2 o TiKx)]/,-] 


Using (13) yields 

(T 2 o 70(1) = 3x - 5 and (T 2 o Ti)(x) = (3x - 5) 2 = 9x 2 - 30x + 25 


From this and the fact that B' — 

{1.x 

, x 2 }, it follows that 



"-5" 


25” 

[(Pi 0 TiXl)]*, = 

3 

and [(73 0 P\)(x)] B ' = 

-30 


0 


9 


Substituting in ( 16) yields 


-5 25 


(16) 


which agrees with (15). 


[Pi o P\]b',b = 


3 -30 

0 9 
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Exercise Set 8.4 


1. Let T \ P 2 -+ P^ be the linear transformation defined by 
T(p(x)) = xp(x). 

(a) Find the matrix for T relative to the standard bases 

S = {ui,u 2 ,u 3 } and B' = (v! , v 2 , v 3 , v 4 j 

where 

Ul = 1, U 2 = X, u 3 = x 2 

VI = 1, V 2 = X, V 3 = X 2 , V4 = X 3 

(b) Verify that the matrix \T\ B f B obtained in part (a) satisfies 
Formula (5) for every vector x = Co + C\X + c 2 x 2 in P 2 . 

2. Let T : P 2 — > P\ be the linear transformation defined by 

T(a a -f- Q\X -f- ci 2 x~ ) = (r^o d- n 3 ) — (2ni -F 3 o 2 )x 

(a) Find the matrix for T relative to the standard bases 
B = (1, x, x 2 } and S' = { 1, x] for P 2 and P\ . 

(b) Verify that the matrix [T] B ', B obtained in part (a) satisfies 
Formula (5) for every vector x = Co + C\X + c 2 x 2 in P 2 . 

3. Let T:P 2 ^P 2 be the linear operator defined by 

T(a 0 + a\X + a 2 x 2 ) = a 0 d- a i(x — 1) 4- a 2 (x — l) 2 

(a) Find the matrix for T relative to the standard basis 
S = (1, x, x 2 } for P 2 . 

(b) Verify that the matrix [T] fl obtained in part (a) satisfies 
Formula (8) for every vector x = a 0 + ci\X + a 2 x 2 in P 2 . 


6. Let T : R 3 — > R 3 be the linear operator defined by 

T(xi,x 2 , x 3 ) = (X[ — x 2 , x 2 — x l? Xi — x 3 ) 

(a) Find the matrix for T with respect to the basis 
S = {vi , v 2 , v 3 }, where 

▼i = (1,0,1), v 2 = (0.1.1). v 3 = (1,1,0) 

(b) Verify that Formula (8) holds for every vector 
x = (X] , x 2 , x 3 ) in R 3 . 

(c) Is T one-to-one? If so, find the matrix of T~ l with respect 
to the basis B. 

7. Let T \ P 2 -¥ P 2 be the linear operator defined by 
T(p(x)) = p(2x + 1), that is, 

T{co + c i x + c 2 x 2 ) = Co d - Ci(2x d- 1) d - c 2 (2x -\- 1)“ 

(a) Find [T]^ with respect to the basis B = {1, x, x 2 }. 

(b) Use the three-step procedure illustrated in Example 2 to 
compute T( 2 — 3x + 4x 2 ). 

(c) Check the result obtained in part (b) by computing 
T(2 — 3x + 4x 2 ) directly. 

8. Let T : P 2 —> P 2 be the linear transformation defined by 
T(p(x)) = xp(x — 3), that is, 


• R 2 be the linear operator defined by 


4. Let T : R 2 


and let B = {u 3 , u 2 } be the basis for which 


( 

"xi" 

)- 

Xi — X2 


X 2 _ 

) 

_X\ + X2_ 


Ui 


an J u 2 = 


-L 

0 


(a) Find [T] 5 . 

(b) Verify that Formula (8) holds for every vector x in R 2 . 
5. Let T : R 2 —*■ R 3 be defined by 

X\ d- 2 x 2 


X\ 

0 


(a) Find the matrix [T] B i B relative to the bases 
B = {u|. u 2 } and B' = { v 2 , v 2 , v 3 ), where 


u, = 


u 2 = 


-2 

4 


T(c 0 + cix d- c 2 x 2 ) = x(c 0 d- Ci(x — 3) d- c 2 (x - 3) 2 ) 

fa) Find 5 relative to the bases B = {1, x, x 2 j and 
B' = l l,x,x 2 ,x 3 }. 

(b) Use the three-step procedure illustrated in Example 2 to 
compute F(1 d- x — x 2 ). 

(c) Check the result obtained in part (b) by computing 
T(l + x — x 2 ) directly. 


9. Let vi = 


and v 2 = 


- 1 ‘ 

4 


, and let A = 


1 3' 

-2 5 


be the 


matrix for T : R 2 -> R 2 relative to the basis B = { Vi , v 2 } - 
fa) Find [r(v,)] B and [7\v 2 )] B . 

(b) Find Ffvi) and T(y 2 ). 

(c) Find a formula for T 1 ' ' 


x 2 



T 


" 2 " 


'3' 


3 

-2 

1 

O' 

V| = 

1 

, v 2 = 

2 

, v 3 = 

0 

10. Let A = 

1 

6 

2 

1 


1 


0 


0 


-3 

0 

7 

1 


(d) Use the formula obtained in (c) to compute T 


be the matrix for 


(b) Verify that Formula (5) holds for every vector in R 2 


T : R 4 — »• R 3 relative to the bases B = { v 2 , v 2 , v 3 , V4) and 
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B ' = {w!, w 2 , w 3 ), where 



"O' 


- 2 - 


- r 


■6" 


1 


1 


4 


9 

Vi = 


, V 2 = 


, V 3 = 


, v 4 = 



1 


-1 


-1 


4 


.1. 


1_ 


2_ 


. 2 . 



'O' 


'-7' 


'-6' 

Wi = 

1 

OO OO 

1 

, W 2 = 

8 

1. 

, w 3 = 

9 

1_ 


(a) Find [rCvObs [r(v 2 )] flS [r(v 3 )] flS and [T(v 4 )] fl -. 

(b) Find TCvO, T(v 2 ), T(v 3 ), and P(v 4 ). 



/ 

"u" 

\ 

(c) Find a formula for T 


x 2 



X 3 




-X 4 . 

/ 


(d) Use the formula obtained in (c) to compute T 


/ 

'2' 

\ 


2 



0 


V 

.0, 

/ 


11. Let A = 


1 

2 

6 


3 

0 

-2 


-r 

5 

4 


be the matrix for T : P 2 —■ ► P 2 with 


respect to the basis B = {vi, v 2 , v 3 j, where V| = 3x + 3x 2 , 
v 2 = — 1 + 3x + 2x 2 , v 3 = 3 + 7x + 2x 2 . 

(a) Find [r(v!)] B , [7 Xv 2 )]j*. and [T(v 3 )] B . 

(b) Find 7(vi), T(y 2 ), and r(v 3 ). 

(c) Find a formula for T(a 0 + a\X + a 2 x 2 ). 

(d) Use the formula obtained in (c) to compute T( 1 + x 2 ). 


12. Let Ti : P\ — > P 2 be the linear transformation defined by 
T\(p(x)) = xp(x) 

and let T 2 : P 2 -> P 2 be the linear operator defined by 
T 2 (p(x)) = p(2x + 1) 


(c) Verify that the matrices in part (a) satisfy the formula you 
stated in part (b). 

14. Let B = {vi , v 2 , v 3 , v 4 } be a basis for a vector space V. Find 
the matrix with respect to B for the linear operator T : V —*■ V 
defined by T(V|) = v 2 , T(\ 2 ) = v 3 , T(\ 2 ) = v 4 . 7\v 4 ) = ▼!. 


15. Let T: P 2 —> M 22 be the linear transformation defined by 


np) = 


p( o) 
p(- i) 


p( i) 

P( 0 ) 


let B be the standard basis for M 22 , and let B' = {1, x, x 2 j, 
B" = (1, 1 + x, 1 + x 2 } be bases for P 2 . 


(a) Find [T] BB : and 


(b) For the matrices obtained in part (a), compute 

T( 2 + 2x + x 2 ) using the three-step procedure illustrated 
in Example 2. 


(c) Check the results obtained in part (b) by computing 
T( 2 + 2x + x 2 ) directly. 


16. Let T : M 22 — »• R 2 be the linear transformation given by 
T 


a b 

\_ 

ci ~\~ b “I - c 

c d 

J' 

d 


and let B be the standard basis for M 22 , B 1 the standard basis 
for R 2 , and 


B" = 

(a) Find and [T] fl » B . 

"l 2 


-1 

0 


(b) Compute T 


3 4 


using the three-step procedure 


that was illustrated in Example 2 for both matrices found 
in part (a). 

(c) Check the results obtained in part (b) by computing 

1 2 ~l 

^ ) directly. 


Let B — (1, x} and B' = {1, x, x 2 } be the standard bases for 
Pi and P 2 . 

(a) Find \T 2 o Ti] B i B , \T 2 \ H ,, and [Ti\ B , tB . 

(b) State a formula relating the matrices in part (a). 

(c) Verify that the matrices in part (a) satisfy the formula you 
stated in part (b). 

13. Let 7i : P\ — > P 2 be the linear transformation defined by 
Ii(c 0 + Cix) = 2c 0 — 3ciX 

and let T 2 . P 2 — »■ P 2 be the linear transformation defined by 


17. ( Calculus required) Let /): P 2 P 2 be the differentiation 
operator D(p) = p’{x). 

(a) Find the matrix for D relative to the basis B = {p, , p 2 , p 3 j 
for P 2 in which pj = 1, p, = x, p 3 = x 2 . 

(b) Use the matrix in part (a) to compute D( 6 — 6x + 24x 2 ). 

18. ( Calculus required) Let D: P 2 — > P 2 be the differentiation 
operator D(p) = p'{x). 

(a) Find the matrix for D relative to the basis B = { pj . p 2 , p 3 j 
for P 2 in which pj = 2, p 2 = 2 — 3x, p 3 = 2 — 3x + 8x 2 . 

(b) Use the matrix in part (a) to compute 0(6 — 6x + 24x 2 ). 


T 2 (co + cix + c 2 x 2 ) = 3cox + 3cix 2 + 3c 2 x 3 
Let B = (1, x}, B" = (1, x, x 2 }, and B' — (1, x, x 2 , x 3 }. 

(a) Find \T 2 o Ti\ B2B , [T 2 ] B ' tB ", and [Ti\ b » <b . 

(b) State a formula relating the matrices in part (a). 


19. ( Calculus required) Let V be the vector space of real-valued 
functions defined on the interval (—00, oo), and let O: V — > V 
be the differentiation operator. 

(a) Find the matrix for D relative to the basis B = {fj , f 2 , f 3 j 
for V in which f x = 1, f 2 = sinx, f 3 = cosx 


8.5 Similarity 481 


(b) Use the matrix in part (a) to compute 

D(2 + 3 sin x — 4 cos x) 

20. Let V be a four-dimensional vector space with basis B , let 
W be a seven-dimensional vector space with basis B', and 
let T : V — *■ W be a linear transformation. Identify the four 
vector spaces that contain the vectors at the corners of the 
accompanying diagram. 


Direct 

computation 


T(x) 


(1) I (3) 

Multiply by [T] b . b 

Mb [L(x)] B . 

( 2 ) 


◄ Figure Ex-20 


21. In each part, fill in the missing part of the equation. 

(a) [T 2 o T<\b.,b = [T 2 ]_]_[Ti] B ",b 

(b) [T 3 o T 2 o T^b'.b = IUI ? \ T^ K „, K „\TA K " « 


23. Prove that if B and B’ are the standard bases for R" and 
R m , respectively, then the matrix for a linear transformation 
T : R n -»• R m relative to the bases B and B’ is the standard 
matrix for T . 


True-False Exercises 


TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

(a) If the matrix of a linear transformation T : V — > W relative to 

f2 4" 


some bases of V and W is 

L0 3. 

vector x in V such that T(x) = 2x. 


then there is a nonzero 


(b) If the matrix of a linear transformation T : V — > W relative to 
f2 41 


bases for V and W is „ 

_° 3 

x in V such that T(x) = 4x. 


, then there is a nonzero vector 


(c) If the matrix of a linear transformation T : V — > W relative to 

'1 41 

certain bases for V and W is 


2 3 


, then T is one-to-one. 


Working with Proofs 

22. Prove that if T : V — > W is the zero transformation, then the 
matrix for T with respect to any bases for V and IV is a zero 
matrix. 


(d) If S: V — >- V and T : V — > V are linear operators and B is a 
basis for V, then the matrix of 5 o T relative to B is [71 b [S] fi- 
fe) If T : V — > V is an invertible linear operator and B is a basis 
for V, then the matrix for T~ l relative to B is [71 b* ■ 


8.5 Similarity 

The matrix for a linear operator T:V^-V depends on the basis selected for V. One of the 
fundamental problems of linear algebra is to choose a basis for V that makes the matrix for 
T as simple as possible — a diagonal or a triangular matrix, for example. In this section we 
will study this problem. 


Simple Matrices for Linear 
Operators 


Standard bases do not necessarily produce the simplest matrices for linear operators. For 
example, consider the matrix operator T : R 2 —*■ R 2 whose matrix relative to the standard 
basis B = {ei, e2} for R 2 is 


[Th = 



r 

4 


( 1 ) 


Let us compare this matrix to the matrix [T]b’ for the same operator T but relative to 
the basis B’ = {Uj, u' 2 } for R 2 in which 


T 


T 

_1 

, u 2 = 



2_ 


Since 


T(u',) = 


' 1 

r 

T 


’2" 

—2 

4_ 

_ 1 _ 


_2_ 


= 2u, 


and T( u 2 ) 


' 1 

r 

T 


'3' 

—2 

4_ 

_2_ 


6_ 


= 3u, 


"2' 

and mulls' = 

"0" 

_ 0 _ 


_3_ 


it follows that 


[TXui)]* = 
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A New View of Transition 
Matrices 


Effect of Changing Bases 
on Matrices of Linear 
Operators 


so the matrix for T relative to the basis B' is 

\T]b> = I T( u')*.] = 

This matrix, being diagonal, has a simpler form than [T]b and conveys clearly that the 
operator T scales u', by a factor of 2 and u 2 by a factor of 3, information that is not 
immediately evident from [T]#. 

One of the major themes in more advanced linear algebra courses is to determine 
the “simplest possible form” that can be obtained for the matrix of a linear operator by 
choosing the basis appropriately. Sometimes it is possible to obtain a diagonal matrix 
(as above, for example), whereas other times one must settle for a triangular matrix or 
some other form. We will only be able to touch on this important topic in this text. 

The problem of finding a basis that produces the simplest possible matrix for a linear 
operator T:V^-V can be attacked by first finding a matrix for T relative to any basis, 
typically a standard basis, where applicable, and then changing the basis in a way that 
simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts 
about changing bases. 


2 0 
0 3 


Recall from Formulas (7) and (8) of Section 4.6 that if B = {ui,U 2 , ...,u„} and 
B' = {uj , uj, . . . ,a' n \ are bases for a vector space V, then the transition matrices from B 
to B' and from B' to B are 

Pb^b’ = [[“iIb' I [uels' I • • • I [u,,]b'] (3) 

P B '->b = [[u'Jb I [u' 2 ]« I I [<]*] (4) 

where the matrices Pb^b' and Pb'->b are inverses of each other. We also showed in 
Formulas (11) and (12) of that section that if v is any vector in V, then 

Pb^b'Wb = [v]«- (5) 

Pb'^-bMb' = [v]b (6) 

The following theorem shows that transition matrices in Formulas (3) and (4) can be 
viewed as matrices for identity operators. 


THEOREM 8.5.1 If B and B' are bases for a finite-dimensional vector space V, and if 
T. V — > V is the identity operator on V, then 

Pb^b 1 = [I]b'.b and Pb'^b = I J]b,b' 

Proof Suppose that B = {ui, u 2 , . . . , u„} and B' = { uj, u' 2 , . . . , u', } are bases for V. Us- 
ing the fact that /(v) = v for all v in V, it follows from Formula (4) of Section 8.4 that 

MB'.* = [[/(Ul)b- | [/(U2)]fl' I I [/(U»)b'] 

= [[Ulb' | [U2b< I ' ' ' I [U„] B <] 

= Pb~*B' 1 Formula (3) above | 

The proof that (/]b,b' = Pb'^b is similar. 

We are now ready to consider the main problem in this section. 


Problem If B and B' are two bases for a finite-dimensional vector space V, and if 
T : V -* V is a linear operator, what relationship, if any, exists between the matrices 
[T] b and [T] b C 



8.5 Similarity 483 


The answer to this question can be obtained by considering the composition of the three 
linear operators on V pictured in Figure 8.5.1. 


I T I 



v v T(v) T(\) 

V V V V 

► Figure 8.5.1 Basis = B' Basis = B Basis = B Basis = B' 

In this figure, v is first mapped into itself by the identity operator, then v is mapped 
into T(v) by T, and then T(v) is mapped into itself by the identity operator. All four 
vector spaces involved in the composition are the same (namely, V), but the bases for the 
spaces vary. Since the starting vector is v and the final vector is T(\), the composition 
produces the same result as applying T directly; that is, 

T — I oT o / (7) 

If, as illustrated in Figure 8.5.1, the first and last vector spaces are assigned the basis B' 
and the middle two spaces are assigned the basis B, then it follows from (7) and For- 
mula (12) of Section 8.4 (with an appropriate adjustment to the names of the bases) that 

[T]b',b' = [I o T o I]b’.b' = [I]b’,bIT]b,b[I]b,b' (8) 

or, in simpler notation, 

[T]b> = [I]b',bIT] b [I]b.b' (9) 

We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as 

[T] B ' = Pb^b'[T] b P b ^ b ( 10 ) 

In summary, we have the following theorem. 

THEOREM 8.5.2 Let 7': V — > V be a linear operator on a finite-dimensional vector 
space V, and let B and B' be bases for V. Then 

m B ' = p- l \T] B p (ii) 

where P = P B ->b and P~ l = P B -> B '- 

When applying Theorem 8.5.2, it is easy to forget whether P = Pb'^b (correct) or 
P = Pb->b' (incorrect). It may help to use the diagram in Figure 8.5.2 and observe that the 
exterior subscripts of the transition matrices match the subscript of the matrix they enclose. 

In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices 
representing the same linear operator relative to different bases must be similar. The fol- 
lowing theorem, which we state without proof, shows that the converse of Theorem 8.5.2 
is also true. 


[' T] B ' ~ P B^B' 

1 

Exterior subscripts 

▲ Figure 8.5.2 


8.5.3 IfV is a finite-dimensional vector space, then two matrices A and B 
represent the same linear operator ( but possibly with respect to different bases) if and 
only if they are similar. Moreover, if B = P~ l AP, then P is the transition matrix from 
the basis used for B to the basis used for A. 
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Similarity Invariants 


► EXAMPLE 1 Similar Matrices Represent the Same Linear Operator 

We showed at the beginning of this section that the matrices 



' i r 


'2 

O' 

c = 

-2 4 

and D = 

0 

3 


represent the same linear operator T: R 2 — >• R 2 . Verify that these matrices are similar by 
finding a matrix P for which D — P~ 1 CP. 

Solution We need to find the transition matrix 

P = Pb'^B = [[u'Js | [u 2 ]b] 

where B' = {u'j , u' 2 } is the basis for R 2 given by (2) and B = {ei , e 2 } is the standard basis 
for R 2 . We see by inspection that 

U; = ei + e 2 
u 2 = ei + 2ei 

from which it follows that 

[u',]« = 


and [u 2 ]« = 


Thus, 

P — Pb'->b = [IVJb | [u 2 ]b] 
We leave it for you to verify that 


'i r 
1 2 


P~ l = 




and hence that 


'2 

O' 


' 2 -r 

' i r 

'i r 

_0 

3_ 


— 1 1 _ 

-2 4_ 

_1 2_ 


D P~ l C P 


Recall from Section 5.2 that a property of a square matrix is called a similarity invariant 
if that property is shared by all similar matrices. In Table 1 of that section we listed 
the most important similarity invariants. Since we know from Theorem 8.5.3 that two 
matrices are similar if and only if they represent the same linear operator T: V V, it 
follows that if B and B' are bases for V, then every similarity invariant property of [T]b 
is also a similarity invariant property of [T]^. For example, for any two bases B and B' 
we must have 

det[T] B = detUlfl/ 

It follows from this equation that the value of the determinant depends on T, but not on 
the particular basis that is used to represent T in matrix form. Thus, the determinant 
can be regarded as a property of the linear operator T , and we can define the determinant 
of the linear operator T to be 

det(T) = det[T] B (12) 

where B is any basis for V. Table 1 lists the basic similarity invariants of a linear operator 
T : V V. 
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Table Similarity Invariants 


Property 

Similarity 

Determinant 

[T] B and P~ l [T] B P have the same determinant. 

Invertibility 

[T] b is invertible if and only if P~ l [T] B P is invertible. 

Rank 

[T] b and P~'[T] B P have the same rank. 

Nullity 

[T] b and P~'[T] B P have the same nullity. 

Trace 

[T] b and P~ l [T] B P have the same trace. 

Characteristic polynomial 

[T] b and P~ l [T] B P have the same characteristic polynomial. 

Eigenvalues 

[T] b and P~ l [T] B P have the same eigenvalues. 

Eigenspace dimension 

If X is an eigenvalue of [T] 5 and P~' [T] 5 P, then the eigenspace of 
[T] b corresponding to X and the eigenspace of P~ 1 [T] B P corresponding 
to X have the same dimension. 


► EXAMPLE 2 Determinant of a Linear Operator 

At the beginning of this section we showed that the matrices 


[T] 


i r 

-2 4 


and [T] b > 


'2 O' 
0 3 


represent the same linear operator relative to different bases, the first relative to the 
standard basis B — {ei, ei} for R 2 and the second relative to the basis B' — ju', , Ut} for 
which 



T 


T 

u, = 


, u 9 = 


1 

1 

’ 1 

2 


This means that [T] and [ T]b > must be similar matrices and hence must have the same 
similarity invariant properties. In particular, they must have the same determinant. We 
leave it for you to verify that 


det[T] = 



and 


det[T] B - 


2 0 


0 3 


► EXAMPLE 3 Eigenvalues of a Linear Operator 

Find the eigenvalues of the linear operator T : P 2 -+ P 2 defined by 

T (a + bx + cx 2 ) = —2c + (a + 2b + c)x + (a + 3 c)x 2 


Solution Because eigenvalues are similarity invariants, we can find the eigenvalues of T 
by choosing any basis B for P 2 and computing the eigenvalues of the matrix [T] B - We 
leave it for you to show that the matrix for T relative to the standard basis B — { 1 , x , x 2 } 
is 


t T] b = 


0 

1 

1 


0 -2 
2 1 

0 3 


Thus, the eigenvalues of T are X — 1 and X = 2 (Example 7 of Section 5.1 ). 
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Exercise Set 8.5 

In Exercises 1-2, use a property from Table 1 to show that the 
matrices A and B are not similar. 


8. T:R 2 ^R 2 is defined by 


1. (a) A = 

(b) A = 

2. (a) A = 

(b) A = 


1 3 
1 1 


B = 


1 2 
1 1 


( 

~Xy 


Xy + 1X2 

{ 

X2. 

)- 

3xy — 4x 2 


and B = {u t , u 2 j and B' = (vi, v 2 ), where 


1 

1 

B 


-1 


0 



'2' 


' 4" 


'18' 


'10" 

1 

2 



0 


1 


Hi = 

2_ 

, u 2 = 

1_ 

; V! = 

_ 8_ 

, v 2 = 

_ 5_ 

1 

1 

r 



'1 

1 

r 

9. T: R 1 —>■ R 2 is defined by 





1 

1 

0 

, B = 

1 

1 

0 


T(Xy,X 2 ,Xy) = ( 

— 2.Xi — X 2 , Xy + Xy, X 2 ) 


1 

0 

0 



1 

1 

0 










'1 0 r 
0 1 0 
0 1 0 


B = 


'0 0 f 
0 0 1 
1 0 0 


3. Let T : R 2 — > R 2 be a linear operator, and let B and B' be bases 
for R 2 for which 


m* = 


and P B ^ B ' = 


2 0 
1 1 

Find the matrix for T relative to the basis B' . 


3 2 
1 1 


= 


3 2 

-1 1 


and P B '^b = 


4 5 

1 -1 


m# = 


and P B ->b' = 


2 0 
1 1 

Find the matrix for T relative to the basis B. 


3 2 

1 1 


VT] B ' = 


and P B '->b = 


3 2 

-1 1 

Find the matrix for T relative to the basis B. 


4 5 

1 -1 


( 

V 

v 

Xy — 2.X 2 


X2. 

) 

-X2 


and B — {ui, u 2 j and B' = { v 2 , v 2 ), where 
ui = 


T 


'O' 


'4' 


'7" 


, U 2 = 


; vi = 


, v 2 = 


_ 0 _ 


1 


_ 1 _ 


_2_ 


B is the standard basis, and B' = {vj , v 2 , v 3 ), where 

v, = (-2,1,0), v 2 = (-1,0,1), v 3 = (0,1,0) 

10. T: R 3 — > R 2 is defined by 

T(x l, JC 2 , X 3 ) = (Xy + 2X2 — Xi, — X 2 , Xy + 1x2,) 

B is the standard basis, and B' = {v 3 , v 2 , v 3 ), where 
V! = (1.0,0). v 2 = (1,1,0), v 3 = (1,1,1) 

11. T:R 2 ^R 2 is the rotation about the origin through an angle 
of 45°, B is the standard basis, and B' = {vi, v 2 ), where 


4. Let T : R 2 — > R 2 be a linear operator, and let B and B' be bases 
for R 2 for which 


vi = 


(jz’ji)’ V2_ ( V2’ V 2 ) 


Find the matrix for T relative to the basis B' . 

5. Let T : R 2 -*■ R 2 be a linear operator, and let B and B' be bases 
for R 2 for which 


6. Let T : R 2 -*■ R 2 be a linear operator, and let B and B' be bases 
for R 2 for which 


In Exercises , find the matrix for T relative to the basis B, 
and use Theorem 8.5.2 to compute the matrix for T relative to the 
basis B' . 

7. T : R 2 R 2 is defined by 


12. T: R 2 — > R 2 is the shear in the x -direction by a positive factor 
k, B is the standard basis, and B' = {v 3 , v 2 ), where 

V! = (k, 1), v 2 = (1,0) 

13. T: Pi — > Pi is defined by 

T(a 0 + aix) = —a 0 + (a 0 + ai)x 
B is the standard basis for and B' = {q t , q->}, where 
q 3 = x + 1, q 2 = x — 1 

14. T : Py^Py is defined by T(a 0 + ayx) = a 0 + ciy(x + 1), and 
B = { pi , p 2 ) and B' = {q 3 , q 2 ), where 

p 3 = 6 + 3x, p 2 = 10 + 2.v; q, = 2, q 2 = 3 + 2x 

•&n 

T(a 0 + ayx + fl 2 x 2 ) = (5flo + 6fli + 2 a 2 ) 

— (fli + 8a 2 )jc + (a 0 — 2fl 2 ).r 2 

(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T. 

16. Let T : M 22 — > M 22 be defined by 

a b \ 2c a + c 

c d_ ) _b — 2c d 

(a) Find the eigenvalues of T. 

(b) Find bases for the eigenspaces of T . 
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17. Since the standard basis for R" is so simple, why would one 
want to represent a linear operator on R" in another basis? 

18. Find two nonzero 2x2 matrices (different from those in 
Exercise 1) that are not similar, and explain why they are not. 

In Exercises 9-21, find the determinant and the eigenvalues 
of the linear operator T. 

19. T:R 2 ->R 2 , where 

T(x i, X 2 ) = (3.ti — 4x2, — *1 + 7*2) 

20. T: R 3 — >• R 3 , where 

T(x u x 2 , x 3 ) = (xi - * 2 , x 2 - jc 3 , *3 - -* 1 ) 

21. T : P 2 -+ P 2 , where 
T(p(x)) = p(x - 1) 

22. Let T: P 4 — >■ P 4 be the linear operator given by the formula 

T(p(x)) = p(2x + 1). 

(a) Find a matrix for T relative to some convenient basis, and 
then use it to find the rank and nullity of T . 

(b) Use the result in part (a) to determine whether T is one- 
to-one. 

Working with Proofs 

23. Complete the proof below by justifying each step. 

Hypothesis: A and B are similar matrices. 

Conclusion: A and B have the same characteristic polynomial. 
Proof: (1) det (XI - B) = det(L/ - p-'AP) 

(2) =det (XP~ l P - P- 1 AP) 

(3) =det(/>- 1 a/-A)P) 

(4) = deUP- 1 ) deta/ - A) det(P) 

(5) = det(/>-‘) det(P) det (XI - A) 

(6) = det (XI - A) 

24. If A and B are similar matrices, say B = P~ l AP, then it fol- 
lows from Exercise 23 that A and B have the same eigenvalues. 
Suppose that it is one of the common eigenvalues and x is a 
corresponding eigenvector of A. See if you can find an eigen- 
vector of B corresponding to A. (expressed in terms of A, x, 
and P). 

In Exercises 25-28, prove that the stated property is a similarity 
invariant. 

25. Trace 26. Rank 

27. Nullity 28. Invertibility 

29. Let it be an eigenvalue of a linear operator T : V — > V. Prove 
that the eigenvectors of T corresponding to A are the nonzero 
vectors in the kernel of XI — T . 


30. (a) Prove that if A and B are similar matrices, then A 2 and B 2 

are also similar. 

(b) If A 2 and B 2 are similar, must A and B be similar? Ex- 
plain. 

31. Let C and D be in x n matrices, and let B = {vj, v 2 , . . . , v„) 
be a basis for a vector space V. Prove that if C[x] fl = £>[x] 5 
for all x in V, then C = D. 

True-False Exercises 

TF. In parts (a)-(h) determine whether the statement is true or 

false, and justify your answer. 

(a) A matrix cannot be similar to itself. 

(b) If A is similar to B , and B is similar to C, then A is similar 
to C. 

(c) If A and B are similar and B is singular, then A is singular. 

(d) If A and B are invertible and similar, then A -1 and B~ l are 
similar. 

(e) If 7j : R n — > R" and T 2 : R " -> R" are linear operators, and if 
[T’iIb'.b = \T 2 \B 1 ,b with respect to two bases B and 6' for R n , 
then 7j(x) = T 2 (x) for every vector x in R". 

(f) If 7j : R" R" is a linear operator, and if [7j] B = [7j] fl ' with 
respect to two bases B and B' for R", then B = B' . 

(g) If T : R n -*■ R" is a linear operator, and if [J]b = /„ with re- 
spect to some basis B for R " , then T is the identity operator 
on R" . 

(h) If T : R" — > R " is a linear operator, and if [T] B ' B = I„ with 
respect to two bases B and B' for R " , then T is the identity 
operator on R n . 

Working with Technology 

Tl. Use the matrices A and P given below to construct a matrix 

B = p-'AP that is similar to A, and confirm, in accordance with 

Table 1, that A and B have the same determinant, trace, rank, 

characteristic equation, and eigenvalues. 



"—13 

-60 

-60" 


1 

-1 

r 

A = 

10 

42 

40 

and P = 

2 

-1 

-1 


-5 

-20 

-18 


-1 

-1 

0 


T2. Let T : R 3 — »■ R 3 be the linear transformation whose standard 
matrix is the matrix A in Exercise TL Find a basis S for R 3 for 
which [Tjs is diagonal. 
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Supplementary Exercises 


1. Let A be an n x n matrix, B a nonzero n x 1 matrix, and x a 
vector in R " expressed in matrix notation. Is T(x) = Ax + B 
a linear operator on R n l Justify your answer. 


8. Let V and W be vector spaces, let T , 7\, and T 2 be linear 
transformations from V to W, and let k be a scalar. Define 
new transformations, T\ + T 2 and kT, by the formulas 


2. Let 


(a) Show that 


A = 


cos 9 
sin# 


— sin# 
cos 9 



cos 29 

— sin 29 


cos 39 

— sin 39 

A 2 = 

sin 29 

cos 2 9 

and A 3 = 

sin 39 

cos 39 


(Tf + r 2 )(x) = r,(x) + r 2 (x) 

(kT)(x) = k(T(x)) 

(a) Show that (7) + T 2 ): V — >• VP and kT :V —>W are both 
linear transformations. 

(b) Show that the set of all linear transformations from V to 
W with the operations in part (a) is a vector space. 


(b) Based on your answer to part (a), make a guess at the form 
of the matrix A" for any positive integer n . 

(c) By considering the geometric effect of multiplication by 
A, obtain the result in part (b) geometrically. 

3. Devise a method for finding two n x n matrices that are not 
similar. Use your method to find two 3x3 matrices that are 
not similar. 

4. Let V! , v 2 v m be fixed vectors in R" , and let T \R n ^r R m 

be the function defined by T(x) = (x • Vi, x • v 2 , . . . , x • v,„), 
where x • v* is the Euclidean inner product on R”. 

(a) Show that T is a linear transformation. 

(b) Show that the matrix with row vectors Vi , v 2 , . . . , v m is the 
standard matrix for T . 


9. Let A and B be similar matrices. Prove: 

(a) A T and B T are similar. 

(b) If A and B are invertible, then A -1 and B~ l are similar. 


10. ( Fredholm Alternative Theorem) Let T : V — * V be a linear op- 
erator on an /; -dimensional vector space. Prove that exactly 
one of the following statements holds: 

(i) The equation T(x) — b has a solution for all vectors 
b in V. 

(ii) Nullity of T >0. 


11. Let T : M 22 -* M 22 be the linear operator defined by 


T(X) = 


T 

0 


X + X 


0 

1 


0" 

1 


5. Let {e! , e 2 , e 3 , e 4 j be the standard basis for /? 4 , and let 
T : R 4 — > R 3 be the linear transformation for which 
J( ei ) = (1,2,1), T(e 2 ) = (0,1,0), 

r(e 3 ) = (1,3,0), r(e 4 ) = (1.1,1) 

(a) Find bases for the range and kernel of T. 

(b) Find the rank and nullity of T. 


Find the rank and nullity of T. 

12. Prove: If A and B are similar matrices, and if B and C are 
also similar matrices, then A and C are similar matrices. 

13. Let L:M 22 ->M 22 be the linear operator that is defined by 
L(M) = M T . Find the matrix for L with respect to the stan- 
dard basis for M 22 . 


6. Suppose that vectors in R 3 are denoted by 1 x 3 matrices, and 
define T : R 3 — > R 3 by 


7X1*1 *2 * 3 ]) = [*1 *2 * 3 ] 


(a) Find a basis for the kernel of T . 

(b) Find a basis for the range of T . 

7. Let B = {vi, v 2 , v 3 , V4) be a basis for a vector space V , and let 
T : V — > V be the linear operator for which 

7 Xvi) = vi + v 2 + v 3 + 3 v 4 
T(y 2 ) = vi - v 2 + 2 v 3 + 2 v 4 
T(v 3 ) = 2 vi — 4 v 2 + 5 v 3 + 3 v 4 
7 Iv 4 ) = — 2 vi + 6 v 2 — 6v 3 — 2 v 4 

(a) Find the rank and nullity of T. 

(b) Determine whether T is one-to-one. 


14. Let B = {ui, u 2 . U 3 I and B' — {vi, v 2 , v 3 ) be bases for a vector 
space V, and let 


"-1 

2 

4' 


'2 

-1 

3“ 

3 

0 

1 

P = 

1 

1 

4 





_0 

1 

2_ 

2 

2 

5 






be the transition matrix from B' to B. 

(a) Express vi, v 2 , v 3 as linear combinations of ui, u 2 , u 3 . 

(b) Express iq, u 2 , u 3 as linear combinations of Vi, v 2 , v 3 . 

15. Let B = {ui, u 2 , u 3 ) be a basis for a vector space V, and let 
T : V — > V be a linear operator for which 


[Th = 


-3 

1 

0 


4 7' 

0 -2 
1 0 


Find [T] S /, where B' = (vi, v 2 , v 3 ) is the basis for V defined 
by 

V!=Ui, V2 = U!+U 2 , V 3 = Ui + U 2 + U 3 
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16. Show that the matrices 


" 1 1 " 
-1 4_ 

are similar but that 

' 3 r 

-6 -2 


and 


"2 r 

1 3 


and 


"-1 2 " 
1 0 


are not. 


17. Suppose that r : V — V is a linear operator, and B is a basis 
for V for which 



X\ — x 2 + Xi 


Xl 

\T(x)\„ = 

X 2 

if Mb = 

x 2 


*1 — Xi 


Xi 


Find [T] b . 

18. Let T: V — > V be a linear operator. Prove that T is one-to-one 
if and only if det(T) ^ 0. 


where 


Pi(x) = 
Pi(x) = 
Pi(x) = 


(x - x 2 )(x - x 3 ) 
(Xi - X 2 )(X X - x 3 ) 
(X — X\)(X — x 3 ) 
(x 2 - Xl)(x 2 - Xi) 
(x — X,)(X — x 2 ) 


(Xi - Xi)(Xi - x 2 ) 

(d) What relationship exists between the graph of the function 

a,P\(x) + a 2 P 2 (x) + a 3 Pi{x) 

and the points (x,, af), (x 2 , a 2 ), and (x 3 , a 3 )? 


22. ( Calculus required) Let p(x) and q(x) be continuous func- 
tions, and let V be the subspace of C (— oo, oo) consisting of all 
twice differentiable functions. Define L: V — > V by 

L(y(x )) = y"(x) + p(x)y'{x) + q(x)y(x) 


19. (Calculus required) 

(a) Show that if f = f(x) is twice differentiable, then the 
function D: C 2 (— oo, co) — > F(— oo, co) defined by 

D( f) = f"(x) is a linear transformation. 

(b) Find a basis for the kernel of D. 

(c) Show that the set of functions satisfying the equa- 
tion D(i) = f(x) is a two-dimensional subspace of 
C 2 (— oo, co) , and find a basis for this subspace. 


20. Let T: P 2 -± R 3 be the function defined by the formula 


T(p(x)) = 


p{~ 1) 
P( 0) 
P( 1) 


(a) Find T(x 2 + 5x + 6). 


(b) Show that T is a linear transformation. 

(c) Show that T is one-to-one. 

(d) Find T^ 1 (0,3,0). 


(e) Sketch the graph of the polynomial in part (d). 


21. Let X\, x 2 , and x 2 be distinct real numbers such that 


T(p(x)) = 


X\ < X 2 < Xi 

and let T: P 2 ^- R 3 be the function defined by the formula 

P(xi) 

P(x 2 ) 

_P(Xi)_ 

(a) Show that T is a linear transformation. 

(b) Show that T is one-to-one. 

(c) Verify that if a\,a 2 , and a } are any real numbers, then 


= aiP\(x) + a 2 P 2 (x) + a 3 Pi(x) 


/ 

a\ 

\ 


0.2 


V 

Of, 

) 


(a) Show that L is a linear transformation. 

(b) Consider the special case where p(x) = 0 and q(x) = 1. 
Show that the function 

<t>(x) = Ci sin* + c 2 cosx 

is in the kernel of L for all real values of Ci and c 2 . 


23. (Calculus required) Let D: P„-> P„ be the differentiation 
operator D(p) = p'. Show that the matrix for D relative to 
the basis B = {1, x, x 2 , . . . , *") is 

' 0100 --- 0 - 
0 0 2 0 ••• 0 

0 0 0 3 ••• 0 

0 0 0 0 • • • n 

0 0 0 0 ••• 0 


24. (Calculus required) It can be shown that for any real number 
c, the vectors 


L x — c, 


(x - c ) 2 
2 ! 


(x - c)* 
n\ 


form a basis for P n . Find the matrix for the differentiation 
operator of Exercise 23 with respect to this basis. 


25. (Calculus required) Let J : Pn Pn+l be the integration trans- 
formation defined by 

J (p) — / ($0 ci \t + • • • + a n t n ) dt 

Jo 

= a 0 x+^x 2 + --+^-x" +1 
2 n + 1 

where p = a 0 + a \X + • • • + a n x n . Find the matrix for J with 
respect to the standard bases for P„ and P„ +1 . 
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INTRODUCTION This chapter is concerned with “numerical methods” of linear algebra, an area of study 
that encompasses techniques for solving large-scale linear systems and for finding 
numerical approximations of various kinds. It is not our objective to discuss 
algorithms and technical issues in fine detail since there are many excellent books on 
the subject. Rather, we will be concerned with introducing some of the basic ideas and 
exploring two important contemporary applications that rely heavily on numerical 
ideas — singular value decomposition and data compression. A computing utility such 
as MATLAB, Mathematica, or Maple is recommended for Sections 9.2 to 9.5. 


9.1 /.(^-Decompositions 

Up to now, we have focused on two methods for solving linear systems, Gaussian 
elimination (reduction to row echelon form) and Gauss-Jordan elimination (reduction to 
reduced row echelon form). While these methods are fine for the small-scale problems 
in this text, they are not suitable for large-scale problems in which computer roundoff 
error, memory usage, and speed are concerns. In this section we will discuss a method 
for solving a linear system of n equations in n unknowns that is based on factoring its 
coefficient matrix into a product of lower and upper triangular matrices. This method, 
called “Z, [/-decomposition,” is the basis for many computer algorithms in common use. 


Solving Linear Systems by 
Factoring 


Our first goal in this section is to show how to solve a linear system Ax = b of n equations 
in n unknowns by factoring the coefficient matrix A. We begin with some terminology. 


DEFINITION 1 A factorization of a square matrix A as 

A = LU (1) 

where L is lower triangular and U is upper triangular, is called an LU-decomposition 
(or LU -factorization) of A. 


Before we consider the problem of obtaining an L [/-decomposition, we will explain 
how such decompositions can be used to solve linear systems, and we will give an illus- 
trative example. 
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The Method of /.//-Decomposition 

Step 1. Rewrite the system Ax = b as 

LUx = b (2) 

Step 2. Define a new n x 1 matrix y by 

Ux = y (3) 

Step 3. Use (3) to rewrite (2) as Ly = b and solve this system for y. 

Step 4. Substitute y in (3) and solve for x. 

This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system 
Ax = b by a pair of linear systems 

Ux = y 
Ly — b 

that must be solved in succession. However, since each of these systems has a triangular 
coefficient matrix, it generally turns out to involve no more computation to solve the 
two systems than to solve the original system directly. 


► Figure 9.1.1 


Solve A\ = b 



► EXAMPLE 1 Solving Ax = b by /.(/-Decomposition 

Later in this section we will derive the factorization 


" 2 6 2' 


" 2 0 0" 


L>J 

-3 -8 0 

= 

-3 1 0 


0 1 3 

4 9 2_ 


.4-3 7_ 


O 

O 


U 


Use this result to solve the linear system 


" 2 6 2" 


~X\ 


"2" 

-3 -8 0 


x 2 

= 

2 

4 9 2_ 


_*3_ 


_3_ 

A 

X 

= 

b 


From (4) we can rewrite this system as 


1 

to 

o 

o 
1 


"i 3 r 


"*f 


"2" 

-3 1 0 


0 1 3 


*2 

= 

2 

.4-3 7. 


o 

o 


_*3_ 


_3_ 


U 


= b 


(4) 


(5) 


In 1979 an important library of machine-independent linear algebra programs called 
UNPACK was developed at Argonne National Laboratories. Many of the programs in that library use 
the decomposition methods that we will study in this section. Variations of the LINPACK routines are 
used in many computer programs, including MATLAB, Mathematics, and Maple. 
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Alan Mathison Turing 
( 1912 - 1954 ) 


As specified in Step 2 above, let us define y \ , y 2 , and y 3 by the equation 

"1 3 1 

0 1 3 

_0 0 1 

U 

which allows us to rewrite (5) as 

' 2 0 

-3 1 

4 -3 

L 

or equivalently as 

2vi 

— 3yi + V 2 =2 
4yi - 3y 2 + 7y 3 = 3 

This system can be solved by a procedure that is similar to back substitution, except 
that we solve the equations from the top down instead of from the bottom up. This 
procedure, called forward substitution, yields 

y\ = 1 , yi = 5, y 3 = 2 

(verify). As indicated in Step 4 above, we substitute these values into (6), which yields 
the linear system 


"i 3 r 


~xf 


T 

0 1 3 


X 2 

= 

5 

O 

O 


_*3_ 


_2_ 


or, equivalently, 

X\ + 3x 2 + *3 = 1 

x 2 + 3 xt, = 5 
x 3 = 2 

Solving this system by back substitution yields 

jci = 2, x 2 — — 1 , x 3 = 2 

(verify). 


X\ 


>’l 

Xi 

= 

y2 

*3_ 


-T 3 

X 

= 

y 



>f 


~2~ 


T2 

= 

2 


_.V 3 . 


_3_ 

y 

= 

b 


= 2 


( 6 ) 


(7) 


Although the ideas were known earlier, credit for popularizing the matrix formulation of 
the /.//-decomposition is often given to the British mathematician Alan Turing for his work on the subject 
in 1948. Turing, one of the great geniuses of the twentieth century, is the founder of the field of artificial 
intelligence. Among his many accomplishments in that field, he developed the concept of an internally 
programmed computer before the practical technology had reached the point where the construction of 
such a machine was possible. During World War II Turing was secretly recruited by the British government's 
Code and Cypher School at Bletchley Park to help break the Nazi Enigma codes; it wasTuring's statistical 
approach that provided the breakthrough. In addition to being a brilliant mathematician, Turing was a world- 
class runner who competed successfully with Olympic-level competition. Sadly, Turing, a homosexual, was 
tried and convicted of "gross indecency" in 1952, in violation of the then-existing British statutes. Depressed, 
he committed suicide at age 41 by eating an apple laced with cyanide. 

[Image: ©National Portrait Gallery] 
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Finding The preceding example illustrates that once an L {/-decomposition of A is obtained, 
LU-Decompositions a linear system Ax = b can be solved by one forward substitution and one backward 
substitution. The main advantage of this method over Gaussian and Gauss-Jordan 
elimination is that it “decouples” A from b so that for solving a sequence of linear 
systems with the same coefficient matrix A, say 

Ax = bi, Ax = b 2 ,..., Ax = b* 

the work in factoring A need only be performed once, after which it can be reused for 
each system in the sequence. Such sequences occur in problems in which the matrix A 
remains fixed but the matrix b varies with time. 

Not every square matrix has an /.{/-decomposition. However, if it is possible to 
reduce a square matrix A to row echelon form by Gaussian elimination without perform- 
ing any row interchanges, then A will have an L [/-decomposition, though it may not be 
unique. To see why this is so, assume that A has been reduced to a row echelon form U 
using a sequence of row operations that does not include row interchanges. We know 
from Theorem 1.5.1 that these operations can be accomplished by multiplying A on the 
left by an appropriate sequence of elementary matrices; that is, there exist elementary 
matrices E\, E 2 , . . . , E k such that 

E k ■ ■ ■ E 2 E\A = U ( 8 ) 

Since elementary matrices are invertible, we can solve (8) for A as 

A = E~ x Ef l ■■■ Ef l U 

or more briefly as 

A = LU (9) 

where 

L = E~ l Ef 1 ---Ef' (10) 

We now have all of the ingredients to prove the following result. 


If A is a square matrix that can be reduced to a row echelon form 
U by Gaussian elimination without row interchanges, then A can be factored as 
A — LU, where L is a lower triangular matrix. 


Proof Let L and U be the matrices in Formulas (10) and (8), respectively. The matrix 
U is upper triangular because it is a row echelon form of a square matrix (so all entries 
below its main diagonal are zero). To prove that L is lower triangular, it suffices to prove 
that each factor on the right side of (10) is lower triangular, since Theorem \.l A(b) will 
then imply that L itself is lower triangular. Since row interchanges are excluded, each Ej 
results either by adding a scalar multiple of one row of an identity matrix to a row below 
or by multiplying one row of an identity matrix by a nonzero scalar. In either case, the 
resulting matrix Ej is lower triangular and hence so is E~ x by Theorem 1.7.1(rf). This 
completes the proof. 


► EXAMPLE 2 An LL/-Decomposition 

Find an /.{/-decomposition of 


A = 


2 

-3 

4 


6 

-8 

9 


2 

0 

2 
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Solution To obtain an L //-decomposition, A = L U . we will reduce A to a row echelon 
form U using Gaussian elimination and then calculate L from (10). The steps are as 
follows: 


Reduction to 
Row Echelon Form 


Row Operation 


Elementary Matrix 

Corresponding to Inverse of the 

the Row Operation Elementary Matrix 


2 6 2 

-3 -8 0 

4 9 2 

Step 1 

1 3 1 

-3 -8 0 

4 9 2 

Step 2 


1 3 1 
0 1 3 
4 9 2 




1 

0 

0" 


'1 

0 

0“ 

Step 3 

(-4 x row 1) + row 3 E 3 = 

0 

1 

0 

= 

0 

1 

0 



_^1 

0 

1 . 


_4 

0 

1 _ 


1 3 1 

0 1 3 

0 -3 -2 


\ x row 1 



1 

2 

0 

0 


"2 

0 

O' 

Ei = 

0 

1 

0 

E[ l = 

0 

1 

0 


0 

0 

1 


_0 

0 

1_ 


(3 x row 1) + row 2 


"1 

0 

O' 


1 

0 

o' 

3 

1 

0 

e 2 1 = 

-3 

1 

0 

_0 

0 

1. 


0 

0 

1_ 

















1 

0 

0 


1 

0 


0 

Step 4 




(3 x row 2) + row 3 E 4 = 

0 

1 

0 

V = 

0 

1 


0 





_0 

3 

1_ 


_0 

-3 


1_ 


‘l 

3 

f 











0 

1 

3 










_0 

0 

7_ 













"1 

0 

0" 


"1 

0 

0" 


Step 5 




y x row 3 E 5 = 

0 

1 

0 

E 5 l = 

0 

1 

0 






0 

0 

1 

7 


_0 

0 

7_ 



1 3 

0 1 

_0 0 


1 

3 

1 


= U 
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and, from (10), 



~2 

0 

o' 


1 

0 

o' 


'l 

0 

o" 


'l 

0 

o' 


'l 

0 

o' 

L = 

0 

1 

0 


-3 

1 

0 


0 

1 

0 


0 

1 

0 


0 

1 

0 


0 

0 

1 


0 

0 

1 


4 

0 

1 


0 

-3 

1 


0 

0 

7 


2 0 0 
-3 1 0 

4-3 7 


2 6 2 


2 0 O' 


"l 3 l' 

-3 -8 0 

= 

-3 1 0 


0 1 3 

4 9 2 


4-3 7 


0 0 1 


is an L (/-decomposition of A. 


(ID 


Bookkeeping 


As Example 2 shows, most of the work in constructing an LU -decomposition is expended 
in calculating L. However, all this work can be eliminated by some careful bookkeeping 
of the operations used to reduce A to U. 

Because we are assuming that no row interchanges are required to reduce A to U, 
there are only two types of operations involved — multiplying a row by a nonzero constant, 
and adding a scalar multiple of one row to another. The first operation is used to 
introduce the leading l’s and the second to introduce zeros below the leading l’s. 

In Example 2, a multiplier of \ was needed in Step 1 to introduce a leading 1 in 
the first row, and a multiplier of 1 was needed in Step 5 to introduce a leading 1 in the 
third row. No actual multiplier was required to introduce a leading 1 in the second row 
because it was already a 1 at the end of Step 2, but for convenience let us say that the 
multiplier was 1. Comparing these multipliers with the successive diagonal entries of L, 
we see that these diagonal entries are precisely the reciprocals of the multipliers used to 
construct U : 


f (D 


L = 


-3 

4 


0 0 
® 0 
-3 (7) 


(12) 


Also observe in Example 2 that to introduce zeros below the leading 1 in the first row, 
we used the operations 


add 3 times the first row to the second 
add —4 times the first row to the third 

and to introduce the zero below the leading 1 in the second row, we used the operation 
add 3 times the second row to the third 

Now note in (11) that in each position below the main diagonal of L, the entry is the 
negative of the multiplier in the operation that introduced the zero in that position in 
U . This suggests the following procedure for constructing an L (/-decomposition of a 
square matrix A, assuming that this matrix can be reduced to row echelon form without 
row interchanges. 
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Procedure for Constructing an L //-Decomposition 

Step 1. Reduce A to a row echelon form U by Gaussian elimination without row 
interchanges, keeping track of the multipliers used to introduce the leading 
l’s and the multipliers used to introduce the zeros below the leading l’s. 

Step 2. In each position along the main diagonal of L, place the reciprocal of the 
multiplier that introduced the leading 1 in that position in U . 

Step 3. In each position below the main diagonal of L, place the negative of the 
multiplier used to introduce the zero in that position in U. 

Step 4. Form the decomposition A — LU . 


► EXAMPLE 3 Constructing an LL/-Decomposition 

Find an L //-decomposition of 


A = 


'6 

9 

3 


-2 

-1 

7 


0 ~ 

1 

5 


Solution We will reduce A to a row echelon form U and at each step we will fill in an 
entry of L in accordance with the four-step procedure above. 


" 6 

-2 

0" 


• 

0 

0" 

9 

-1 

l 


• 

• 

0 

_ 3 

7 

5_ 


_• 

• 

• 

"® 

i 

3 

0 

< — multiplier = ^ 

"6 

0 

0" 

9 

-1 

1 


• 

• 

0 

3 

7 

5 


• 

• 

• 

" 1 

l 

3 

(f 


"6 

0 

0" 

® 

2 

1 

< — multiplier = — 9 

9 

• 

0 

_® 

8 

5_ 

< — multiplier = — 3 

_3 

• 

• 

" 1 

i 

3 

0" 


"6 

0 

0" 

0 

CD 

1 

2 

< — multiplier = 4 

9 

2 

0 

_ 0 

8 

5_ 


_3 

• 

• 

" 1 

1 

3 

0" 


"6 

0 

0" 

0 

1 

1 

2 


9 

2 

0 

_ 0 

® 

1_ 

< — multiplier = — 8 

_3 

8 

• 

■ 1 

1 

3 

0" 


"6 

0 

0" 

0 

1 

1 

2 

L = 

9 

2 

0 

_ 0 

0 

(Q 

< — multiplier = 1 

_3 

8 

1_ 


• denotes an unknown 
entry of L. 


No actual operation is 
performed here since 
there is already a leading 
1 in the third row. 


Thus, we have constructed the L U -decomposition 



as 

o 

o 
i 


1 

1 

wl- 

O 

1 

A = LU = 

9 2 0 


0 1 i 


_3 8 1_ 


.0 0 1 _ 


We leave it for you to confirm this end result by multiplying the factors. 
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LU-Decompositions Are 
Not Unique 


In general, L (/-decompositions are not unique. For example, if 



hi 

0 

o' 


‘l 

“12 

“13 

A — LU = 

hi 

122 

0 


0 

1 

“23 


hi 

hi 

(33 


0 

0 

1 


and L has nonzero diagonal entries (which will be true if A is invertible), then we can 
shift the diagonal entries from the left factor to the right factor by writing 



1 

0 

O' 


A = 

hi/ h\ 

1 

0 



hi/ hi 

hi/ hi 

1 

_ 


1 

0 

o' 

- 

= 

hi/ hi 

1 

0 



hi/ hi 

hi/ hi 

1 



0 

0 ” 


'1 

U \2 

“13 

hi 

0 


0 

1 

“23 

0 

(33 


0 

0 

1 


h 1 “ 12 

(ll“l 3 

hi 

(22 “23 

0 

hi 


which is another L (/-decomposition of A. 


LDU-Decompositions The method we have given for computing L (/-decompositions may result in an “asym- 
metry” in that the matrix U has Is on the main diagonal but L need not. However, if 
it is preferred to have l’s on the main diagonal of both the lower triangular factor and 
the upper triangular factor, then we can “shift” the diagonal entries of L to a diagonal 
matrix D and write L as 

L = L'D 

where L' is a lower triangular matrix with l’s on the main diagonal. For example, a 
general 3x3 lower triangular matrix with nonzero entries on the main diagonal can be 
factored as 


“11 

0 

0 ' 


" 1 

0 

O' 


“11 

0 

0 " 

“21 

“22 

0 

= 

“2l/flll 

1 

0 


0 

“22 

0 

_“31 

“32 

“33_ 


_“31 / “1 1 

“32 /“22 

1 _ 


_ 0 

0 

“33_ 


L L' D 


Note that the columns of L' are obtained by dividing each entry in the corresponding 
column of L by the diagonal entry in the column. Thus, for example, we can rewrite 
(4) as 


If desired, the diagonal ma- 

" 2 

6 

2~ 


" 2 

0 

O' 


“1 

3 

r 

trix and the upper triangular 

-3 

-8 

0 

= 

-3 

1 

0 


0 

1 

3 

matrix in (13) can be mul- 

4 

9 

2_ 


4 

-3 

7_ 


_0 

0 

i_ 

tiplied to produce an LU- 





1 

0 






decomposition in which the 1 ’s 





0 


"2 

0 

O' 

are on the main diagonal of L 




= 

3 

2 

1 

0 


0 

1 

0 

rather than U . 





2 

-3 

1 


_0 

0 

7_ 


T 3 T 
0 1 3 
0 0 1 


(13) 


One can prove that if A is an invertible matrix that can be reduced to row echelon 
form without row interchanges, then A can be factored uniquely as 


A = LDU 
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where L is a lower triangular matrix with l’s on the main diagonal, D is a diagonal 
matrix, and U is an upper triangular matrix with l’s on the main diagonal. This is called 
the LDU -decomposition (or LDU-factorizution) of A. 

PLU-Decompositions Many computer algorithms for solving linear systems perform row interchanges to re- 
duce roundoff error, in which case the existence of an /.[/-decomposition is not guar- 
anteed. However, it is possible to work around this problem by “preprocessing” the 
coefficient matrix A so that the row interchanges are performed prior to computing the 
L [/-decomposition itself. More specifically, the idea is to create a matrix Q (called a 
permutation matrix ) by multiplying, in sequence, those elementary matrices that produce 
the row interchanges and then execute them by computing the product QA. This product 
can then be reduced to row echelon form without row interchanges, so it is assured to 
have an L [/-decomposition 

QA = LU (14) 

Because the matrix Q is invertible (being a product of elementary matrices), the systems 
Ax = b and QAx = Q b will have the same solutions. But it follows from (14) that 
the latter system can be rewritten as LUx= Qb and hence can be solved using LU- 
decomposition. 

It is common to see Equation (14) expressed as 

A = PLU (15) 

in which P — Q~ l . This is called a PLU-decomposition or ( PLU-factorization ) of A. 


Exercise Set 9.1 


1. Use the method of Example 1 and the L [/-decomposition 


' 3 

-6' 


' 3 

0" 

T 

-2" 

—2 

5_ 


—2 

1 

_0 

1 


to solve the system 

3x] — 6 x 2 = 0 
—2x\ + 5x 2 = 1 

2. Use the method of Example 1 and the L [/-decomposition 


1 

G\ 

1 

OJ | 


3 0 0" 


"l -2 -f 

2 0 6 

= 

2 4 0 


0 1 2 

-4 7 4 


-4 -1 2 


0 0 1 


to solve the system 


3x[ — 6x 2 — 3x3 — — 3 


-5 

-10" 

"xi" 



-10" 


6 

5_ 

Xl. 




19_ 


2 

-2 

-2 


Xl 


~-4~ 

0 

-2 

2 


x% 

= 

-2 

-1 

5 

2 


_X3_ 


6 

'-3 

12 

-6 


Xl 


‘-33 

1 

-2 

2 


x 2 

= 

7 

0 

1 

1 


x 3 


-1 


In Exercises 7-8, an L [/-decomposition of a matrix A is given. 

(a) Compute L~ l and [/“'. 

(b) Use the result in part (a) to find the inverse of A. 


2xi + 6x3 = — 22 


2 

-1 

3" 





— 4x'i T 7x2 T 4x3 — 3 

7. A = 

4 

2 

1 

; 






-6 

-1 

2_ 





In Exercises 3-6, find an L [/-decomposition of the coefficient 



1 

0 

o' 


"2 

-1 

3" 

matrix, and then use the method of Example 1 to solve the sys- 

II 

Ed 

"J 

II 

2 

1 

0 


0 

4 

-5 

tem. 



-3 -1 

1 


0 

0 

6 


2 8 " 

V 


~-2 

-1 - 1 _ 

_x 2 _ 


_— 2_ 


8. The / [/-decomposition obtained in Example 2. 
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(a) Find an /.(/-decomposition of A. 

(b) Express A in the form A = L\DU\, where L\ is lower 
triangular with l's along the main diagonal, U\ is upper 
triangular, and D is a diagonal matrix. 

(c) Express A in the form A = L 2 U 2 , where L 2 is lower tri- 
angular with l’s along the main diagonal and (/ 2 is upper 
triangular. 

10. (a) Show that the matrix 

"0 1 " 

1 0 



"0 

3 

-2" 


7" 

16. A = 

1 

1 

4 

; b = 

5 


2 

2 

5 


-2 


17. Let Ax = b be a linear system of n equations in n unknowns, 
and assume that A is an invertible matrix that can be reduced 
to row echelon form without row interchanges. How many 
additions and multiplications are required to solve the system 
by the method of Example 1? 


Working with Proofs 

18. Let 

A=[‘ *1 

_c d 

(a) Prove: If a 7/ 0, then the matrix A has a unique LU- 
decomposition with l’s along the main diagonal of L. 

(b) Find the L (/-decomposition described in part (a). 


has no /.(/-decomposition. 

(b) Find a PL {/-decomposition of this matrix. 

In Exercises use the given PL (/-decomposition of A to 

solve the linear system Ax = b by rewriting it as P~ l Ax — P ~ l b 
and solving this system by L {/-decomposition. 


19. Prove: If A is any n x n matrix, then A can be factored as 
A = PLU , where L is lower triangular, U is upper triangular, 
and P can be obtained by interchanging the rows of /„ appro- 
priately. [Hint: Let U be a row echelon form of A, and let 
all row interchanges required in the reduction of A to U be 
performed first.] 



"2" 


"0 

1 

4~ 

11. b = 

1 

; A = 

1 

2 

2 


5 


3 

1 

3 



"0 

1 

o' 


"1 

0 

o' 


'l 

2 

2 

A = 

1 

0 

0 


0 

1 

0 


0 

1 

4 


0 

0 

1 


3 

-5 

17 


0 

0 

1 










'3' 


"4 

1 

2 

12. b = 

0 

; a = 

0 

2 

1 


6 


8 

1 

8 


'l 

0 o' 


'4 


1 

A = 

0 

0 1 


0 


-1 


0 

1 0 


0 


0 


PLU 


= PLU 


In Exercises 1 4, find the LD (/-decomposition of A. 


13. A = 


2 

4 


2 

1 


14. A = 


-12 

2 

-28 


6 

0 

13 


True-False Exercises 

TF. In parts (a)-(e) determine whether the statement is true or 
false, and justify your answer. 

(a) Every square matrix has an L (/-decomposition. 

(b) If a square matrix A is row equivalent to an upper triangular 
matrix (/, then A has an L (/-decomposition. 

(c) If Li , L 2 , . . . , L k are n x n lower triangular matrices, then the 
product L l L 2 ■ ■ ■ L k is lower triangular. 

(d) If an invertible matrix A has an L (/-decomposition, then A 
has a unique LD (/-decomposition. 

(e) Every square matrix has a PL (/-decomposition. 

Working with Technology 

Tl. Technology utilities vary in how they handle L (/-decompo- 
sitions. For example, many utilities perform row interchanges to 
reduce roundoff error and hence produce PL (/-decompositions, 
even when asked for L (/-decompositions. See what happens when 
you use your utility to find an L (/-decomposition of the matrix A 
in Example 2. 


In Exercises 16, find a PL (/-decomposition of A, and use 
it to solve the linear system Ax = b by the method of Exercises 1 1 
and 12. 



'3 

-1 

o' 


'-2' 

15. A = 

3 

-1 

1 

: b = 

1 


0 

2 

1 


4 


T2. The accompanying figure shows a metal plate whose edges are 
held at the temperatures shown. It follows from thermodynamic 
principles that the temperature at each of the six interior nodes will 
eventually stabilize at a value that is approximately the average of 
the temperatures at the four neighboring nodes. These are called 
the steady-state temperatures at the nodes. Thus, for example, if 
we denote the steady-state temperatures at the interior nodes in 
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the accompanying figure as 7), T 2 , T } , J 4 , T 5 , and T 6 , then at the 
node labeled 7) that temperature will be 7) = f (0 + 5 + T 2 + T 3 ) 
or, equivalently, 

4 Ti -T 2 -T 2 = 5 

Find a linear system whose solution gives the steady-state temper- 
atures at the nodes, and use your technology utility to solve that 
system by L (/-decomposition. 


5° 5° 5° 



T 

T) 

Ts 


t 2 

t 4 

t 6 






20 ° 20 ° 20 ° 


◄ Figure Ex-T2 


9.2 The Power Method 

The eigenvalues of a square matrix can, in theory, be found by solving the characteristic 
equation. However, this procedure has so many computational difficulties that it is almost 
never used in applications. In this section we will discuss an algorithm that can be used to 
approximate the eigenvalue with greatest absolute value and a corresponding eigenvector. 
This particular eigenvalue and its corresponding eigenvectors are important because they 
arise naturally in many iterative processes. The methods we will study in this section have 
recently been used to create Internet search engines such as Google. 

The Power Method There are many applications in which some vector xo in R" is multiplied repeatedly by 
an n x n matrix A to produce a sequence 

x 0 , Ax 0 , A 2 x 0 , . . . , A A x 0 , . . . 

We call a sequence of this form a power sequence generated hy A. In this section we 
will be concerned with the convergence of power sequences and how such sequences 
can be used to approximate eigenvalues and eigenvectors. For this purpose, we make 
the following definition. 


DEFINITION 1 If the distinct eigenvalues of a matrix A are Ai, A 2 , . . . , A.*, and if 
| Ail is larger than |A 2 |, . . . , |A*|, then A! is called a dominant eigenvalue of A. Any 
eigenvector corresponding to a dominant eigenvalue is called a dominant eigenvector 
of A. 


► EXAMPLE 1 Dominant Eigenvalues 

Some matrices have dominant eigenvalues and some do not. For example, if the distinct 
eigenvalues of a matrix are 

A} = — 4, A 2 = — 2, A 3 = 1, A 4 = 3 

then Ai = — 4 is dominant since |A X | = 4 is greater than the absolute values of all the 
other eigenvalues; but if the distinct eigenvalues of a matrix are 

Ai = 7, A 2 = — 7, A 3 = — 2, A 4 = 5 

then |Ai | = |A 2 | = 7, so there is no eigenvalue whose absolute value is greater than the 
absolute value of all the other eigenvalues. 

The most important theorems about convergence of power sequences apply to n x n 
matrices with n linearly independent eigenvectors (symmetric matrices, for example), so 
we will limit our discussion to this case in this section. 
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► Figure 9.2.1 


THEOREM Let A be a symmetric n x n matrix that has a positive dominant 

eigenvalue X. If x o is a unit vector in R " that is not orthogonal to the eigenspace 
corresponding to X , then the normalized power sequence 


Xo, Xi 


Axo Axi Ax,t_i 

Axoll’ * IIAxill Xk II Ax*_i 


( 1 ) 


converges to a unit dominant eigenvector, and the sequence 


Axi'Xi, AXt-X 2 , AX 3 • X 3 , . . . , AXfc-Xfc, ... (2) 


converges to the dominant eigenvalue X. 


Remark In the exercises we will ask you to show that ( 1 ) can also be expressed as 

Ax 0 A 2 x 0 A k x o 

X 0 , Xi = , X? — XL = T, 77,... (3) 

II Axoll’ ' l|A 2 x 0 || |A*xo|| 

This form of the power sequence expresses each iterate in terms of the starting vector x 0 , rather 
than in terms of its predecessor. 

We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the 
2x2 case where A is a symmetric matrix with distinct positive eigenvalues, A.i and X 2 , 
one of which is dominant. To be specific, assume that A,i is dominant and 

A.J > X 2 > 0 

Since we are assuming that A is symmetric and has distinct eigenvalues, it follows from 
Theorem 7.2.2 that the eigenspaces corresponding to A| and X 2 are perpendicular lines 
through the origin. Thus, the assumption that x 0 is a unit vector that is not orthogonal 
to the eigenspace corresponding to T-i implies that x 0 does not lie in the eigenspace 
corresponding to X 2 . To see the geometric effect of multiplying x 0 by A, it will be useful 
to split x 0 into the sum 

Xo = V 0 + Wo (4) 

where v 0 and wo are the orthogonal projections of x 0 on the eigenspaces of A.i and X 2 , 
respectively (Figure 9.2.1 a). 


A. 1 V 0 + a 2 w 0 



This enables us to express Ax 0 as 

Ax 0 = A v 0 + A wo = Xiv 0 + X 2 y/ 0 (5) 


If the dominant eigenvalue is not positive, sequence (2) will still converge to the dominant eigenvalue, but 
sequence (1 ) may not converge to a specific dominant eigenvector because of alternation (see Exercise 1 1 ). Nev- 
ertheless, each term of (1) will closely approximate some dominant eigenvector for sufficiently large values of k. 
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The Power Method with 
Euclidean Scaling 


which tells us that multiplying x 0 by A “scales” the terms v 0 and w 0 in (4) by A.] and A 2 , 
respectively. However, X t is larger than A 2 , so the scaling is greater in the direction of v 0 
than in the direction of wo. Thus, multiplying xo by A “pulls” xo toward the eigenspace 
of Ai, and normalizing produces a vector xi = Ax 0 /|| Ax 0 ||, which is on the unit circle 
and is closer to the eigenspace of Aj than x 0 (Figure 9.2.16). Similarly, multiplying xi by 
A and normalizing produces a unit vector x 2 that is closer to the eigenspace of A! than 
xi. Thus, it seems reasonable that by repeatedly multiplying by A and normalizing we 
will produce a sequence of vectors x* that lie on the unit circle and converge to a unit 
vector x in the eigenspace of Ai (Figure 9.2.1c). Moreover, if x* converges to x, then it 
also seems reasonable that Ax^ • x* will converge to 

Ax •x = Aix*x = Ai||x || 2 = Ai 
which is the dominant eigenvalue of A. 


Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue 
and a corresponding unit eigenvector of a symmetric matrix A, provided the dominant 
eigenvalue is positive. This algorithm, called the power method with Euclidean scaling, is 
as follows: 


The Power Method with Euclidean Scaling 

Step 0. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a 
unit vector x 0 . 

Step 1. Compute Axo and normalize it to obtain the first approximation xi to a dom- 
inant unit eigenvector. Compute Axi • xi to obtain the first approximation 
to the dominant eigenvalue. 

Step 2. Compute Ax; and normalize it to obtain the second approximation x 2 to a 
dominant unit eigenvector. Compute Ax 2 • x 2 to obtain the second approxi- 
mation to the dominant eigenvalue. 

Step 3. Compute Ax 2 and normalize it to obtain the third approximation X 3 to a dom- 
inant unit eigenvector. Compute Ax 3 • x 3 to obtain the third approximation 
to the dominant eigenvalue. 

Continuing in this way will usually generate a sequence of better and better approxi- 
mations to the dominant eigenvalue and a corresponding unit eigenvector. 


► EXAMPLE 2 The Power Method with Euclidean Scaling 

Apply the power method with Euclidean scaling to 


A = 


'3 

2 


2 

3 


with Xq 


T 

0 


Stop at x 5 and compare the resulting approximations to the exact values of the dominant 
eigenvalue and eigenvector. 


If the vector xo happens to be orthogonal to the eigenspace of the dominant eigenvalue, then the hypotheses 
of Theorem 9.2.1 will be violated and the method may fail. However, the reality is that computer roundoff 
errors usually perturb xo enough to destroy any orthogonality and make the algorithm work. This is one 
instance in which errors help to obtain correct results! 
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Ax 0 = 

Axi ss 

Ax 2 ~ 

Ax 3 ps 

AX 4 PS 


'3 

.2 

'3 

2 

'3 

2 

'3 

2 

'3 

2 


It is accidental that A (s) (the 
fifth approximation) produced 
five decimal place accuracy. In 
general, n iterations need not 
produce n decimal place accu- 
racy. 


Solution We will leave it for you to show that the eigenvalues of A are X = 1 and X — 5 
and that the eigenspace corresponding to the dominant eigenvalue X — 5 is the line 
represented by the parametric equations x\ = t, X 2 = t, which we can write in vector 
form as 


x = t 


1 

1 


( 6 ) 


Setting t = l/\/2 yields the normalized dominant eigenvector 


r 1 -i 

V2 


'0.707106781187. . .' 

1 

-x/2- 


0.707106781 187 . . .. 


Now let us see what happens when we use the power method, starting with the unit 
vector x 0 . 


2' 


T 


'3' 

Ax 0 

1 

'3' 

1 

'3' 


'0.83205' 

3_ 


_0_ 


_2 

> 

1 

X 

O 

~ x/l3 

_2 

' 3.60555 

_2 


_0.55470_ 


2' 


'0.83205' 


'3.60555' 

x 2 = 

Axi 

1 

'3.60555' 


'0.73480' 

3. 


.0.55470. 


.3.32820. 

|| Axi|| 

' 4.90682 

.3.32820. 


.0.67828. 

2' 


'0.73480' 


'3.56097' 

x 3 = 

Ax 2 

1 

'3.56097' 


'0.71274' 

3. 


.0.67828. 


.3.50445. 

|| Ax 2 || 

4.99616 

.3.50445. 


.0.70143. 

2' 


'0.71274' 


'3.54108' 

x 4 = 

Ax 3 

1 

'3.54108' 


'0.70824' 

3. 


.0.70143. 


.3.52976. 

|| Ax 3 || 

' 4.99985 

.3.52976. 


.0.70597. 

2' 


'0.70824' 


'3.53666' 

X5 = 

Ax 4 

1 

'3.53666' 


'0.70733' 

3. 


.0.70597. 


.3.53440. 

|| Ax 4 || 

' 4.99999 

.3.53440. 


.0.70688. 


A. (1) = (A Xl ) • xi = (AxO^xi » [3.60555 3.32820] 

X (2) = (Ax 2 ) • x 2 = (Ax 2 ) t x 2 [3.56097 3.50445] 

;„ (3) = (Ax 3 ) • x 3 = (Ax 3 ) f x 3 Rs [3.54108 3.52976] 

if (4) = (Ax 4 ) • x 4 = (Ax 4 ) t X 4 ps [3.53666 3.53440] 

>/ 5) = (Ax 5 ) • x 5 = (Ax 5 ) f x 5 [3.53576 3.53531] 


0.83205' 

0.55470 

0.73480' 

.0.67828. 

0.71274' 

0.70143. 

0.70824' 

0.70597. 

0.70733' 

0.70688 


4.84615 


4.99361 


4.99974 


4.99999 


5.00000 


Thus, A (5) approximates the dominant eigenvalue to five decimal place accuracy and x 5 
approximates the dominant eigenvector in (7) to three decimal place accuracy. M 


The Power Method with There is a variation of the power method in which the iterates, rather than being normal- 
Maximum Entry Scaling ized at each stage, are scaled to make the maximum entry 1. To describe this method, it 

will be convenient to denote the maximum absolute value of the entries in a vector x by 
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max(x). Thus, for example, if 

r 5" 

3 

x = 

-7 

2 _ 

then max(x) = 7. We will need the following variation of Theorem 9.2.1. 


Let A be a symmetric n x n matrix that has a positive dominant 
eigenvalue X. If xo is a nonzero vector in R" that is not orthogonal to the eigenspace 
corresponding to X , then the sequence 


Ax n 


Axi 


x 0 , Xi = 


x 2 


x k 


Ax k - 


max(Axo) max(Axi) max(Axt_i) 

converges to an eigenvector corresponding to X, and the sequence 
Ax] • X! Ax 2 • x 2 Ax 3 • x 3 Ax*. • x k 


Xl • Xl 


x 2 • x 2 


x 3 • x 3 


Xfc • x k 


(B) 


(9) 


converges to X . 


Remark In the exercises we will ask you to show that (8) can be written in the alternative form 


xo, X! 


Ax 0 

, x 2 

max(Axo) 


A 2 X 0 

max(A 2 x 0 ) ’ 


x* 


A^Xo 

max(A*xo) ’ 


( 10 ) 


which expresses the iterates in terms of the initial vector xq. 


We will omit the proof of this theorem, but if we accept that (8) converges to an 
eigenvector of A , then it is not hard to see why (9) converges to the dominant eigenvalue. 
To see this, note that each term in (9) is of the form 


Ax • x 

X • X 


( 11 ) 


which is called a Rayleigh quotient of A . In the case where X is an eigenvalue of A and x 
is a corresponding eigenvector, the Rayleigh quotient is 


Ax • x A.x • x A.(x • x) 

X • X X • X X • X 

Thus, if x k converges to a dominant eigenvector x, then it seems reasonable that 

Ax k • x k Ax • x 

converges to = X 

x k • x k x • x 


which is the dominant eigenvalue. 

Theorem 9.2.2 produces the following algorithm, which is called the power method 
with maximum entry scaling. 


As in Theorem 9.2. 1 , if the dominant eigenvalue is not positive, sequence (9) will still converge to the dominant 
eigenvalue, but sequence (8) may not converge to a specific dominant eigenvector. Nevertheless, each term of 
(8) will closely approximate some dominant eigenvector for sufficiently large values of k (see Exercise 1 1). 
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John William Strutt Rayleigh 
(1842-1919) 


Historical Note The British math- 
ematical physicist John Rayleigh 
won the Nobel prize in physics in 
1904 for his discovery of the in- 
ert gas argon. Rayleigh also made 
fundamental discoveries in acous- 
tics and optics, and his work in 
wave phenomena enabled him to 
give the first accurate explanation 
of why the sky is blue. 

[Image: The Granger Collection, 
New York] 


Ax o = 
Axi S3 

Ax 2 S3 

AX3 RS 

Ax 4 R» 


Whereas the power method 
with Euclidean scaling pro- 
duces a sequence that ap- 
proaches a unit dominant 
eigenvector, maximum entry 
scaling produces a sequence 
that approaches an eigenvec- 
tor whose largest component 
is 1. 


The Power Method with Maximum Entry Scaling 

Step 0. Choose an arbitrary nonzero vector xo- 

Step 1. Compute Ax 0 and multiply it by the factor l/max(Ax 0 ) to obtain the first 
approximation xj to a dominant eigenvector. Compute the Rayleigh quotient 
of xi to obtain the first approximation to the dominant eigenvalue. 

Step 2. Compute Axi and scale it by the factor l/max(Axi) to obtain the second 
approximation x 2 to a dominant eigenvector. Compute the Rayleigh quotient 
of x 2 to obtain the second approximation to the dominant eigenvalue. 

Step 3. Compute Ax 2 and scale it by the factor l/max(Ax 2 ) to obtain the third ap- 
proximation x 3 to a dominant eigenvector. Compute the Rayleigh quotient 
of x 3 to obtain the third approximation to the dominant eigenvalue. 

Continuing in this way will generate a sequence of better and better approximations 

to the dominant eigenvalue and a corresponding eigenvector. 


► EXAMPLE 3 Example 2 Revisited Using Maximum Entry Scaling 

Apply the power method with maximum entry scaling to 


A = 


'3 

2 


2 ' 

3 


with xq 


T 

0 


Stop at x 5 and compare the resulting approximations to the exact values and to the 
approximations obtained in Example 2. 


Solution We leave it for you to confirm that 


3 

2 


T 


'3' 

2 

3 


0 


2 


xi 


3 2' 


'1.00000' 


'4.33333' 

2 3_ 


_0.66667_ 


_4.00000_ 

3 2' 


'1.00000' 


'4.84615' 

2 3_ 


_0.92308_ 


_4.76923_ 

3 2 


'1.00000' 


'4.96825' 

2 3. 


0.9841 3_ 


_4.95238_ 

3 2' 


'1.00000' 


'4.99361' 

2 3 


0.99681 


4.99042 


Ax 0 

1 

'3' 


'1.00000 


max(Axo) 

” 3 

_2_ 


0.66667 


Axi 


1 


'4.33333' 


'1.00000' 

max(Axi) 

' 4.33333 

_4.00000_ 


_0.92308_ 

Ax 2 


1 


'4.84615' 


'1.00000' 

max(Ax 2 ) 

' 4.84615 

_4.76923_ 


0.9841 3_ 

Ax 3 


1 


'4.96825' 


'1.00000' 

max(Ax 3 ) 

' 4.96825 

_4.95238_ 


0.99681 

AX4 


1 


'4.99361' 


'1.00000' 

max(Ax4) 

' 4.99361 

_4.99042_ 


_0.99936_ 


k<» 

a (3) 

a (4 > 

a (5) 


Axi • xi 

(Axi) r X! ^ 

^ 7.00000 ^ 

» 4.84615 

Xl • Xl 

T 

Xj Xl 

1.44444 

Ax 2 • x 2 

(Ax 2 ) r x 2 ^ 

^ 9.24852 ^ 

« 4.99361 

x 2 • x 2 

T 

X; 2 *2 

1.85207 

Ax 3 • x 3 

(Ax 3 ) 7 ’x 3 ^ 

^ 9.84203 ^ 

» 4.99974 

x 3 • x 3 

x[x 3 

' 1.96851 

AX 4 • X 4 

(Ax 4 ) t x 4 ^ 

9.96808 ^ 

« 4.99999 

X 4 ' X 4 

xjx 4 

1.99362 

Ax 5 • x 5 

(Ax 5 ) t x 5 ^ 

9.99360 ^ 

a 5.00000 

x 5 • x 5 

T 

x 5 X 5 

1.99872 
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Rate of Convergence 


Stopping Procedures 


Thus, A (5) approximates the dominant eigenvalue correctly to five decimal places and x 5 
closely approximates the dominant eigenvector 

T 

X= [l. 

that results by taking t — 1 in (6). 


If A is a symmetric matrix whose distinct eigenvalues can be arranged so that 

IM > l^l > |A 3 | > • • • > |A*| 

then the “rate” at which the Rayleigh quotients converge to the dominant eigenvalue 
A.i depends on the ratio |Ai|/|A 2 |; that is, the convergence is slow when this ratio is 
near 1 and rapid when it is large — the greater the ratio, the more rapid the convergence. 
For example, if A is a 2 x 2 symmetric matrix, then the greater the ratio |Ai|/|A 2 |, the 
greater the disparity between the scaling effects of X\ and A 2 in Figure 9.2.1, and hence 
the greater the effect that multiplication by A has on pulling the iterates toward the 
eigenspace of M Indeed, the rapid convergence in Example 3 is due to the fact that 
|k 1 |/|k 2 | = 5/1 = 5, which is considered to be a large ratio. In cases where the ratio is 
close to 1, the convergence of the power method may be so slow that other methods must 
be used. 


If A. is the exact value of the dominant eigenvalue, and if a power method produces the 
approximation A (i) at the Ath iteration, then we call 


X - A<*> 
X 


( 12 ) 


the relative error in X (k) . If this is expressed as a percentage, then it is called the per- 
centage error in X ik) . For example, if'/, = 5 and the approximation after three iterations 
is A® = 5.1, then 


relative error in A (3) = 
percentage error in A (3) 


A - A< 3 > 


5-5.1 

A 


5 


= 0.02 x 100% = 2% 


|- 0 . 02 | = 0.02 


In applications one usually knows the relative error E that can be tolerated in the 
dominant eigenvalue, so the goal is to stop computing iterates once the relative error 
in the approximation to that eigenvalue is less than E. However, there is a problem in 
computing the relative error from ( 1 2) in that the eigenvalue A is unknown. To circumvent 
this problem, it is usual to estimate A by X (k) and stop the computations when 


A« _ a.(*-D 
A® 


< E 


(13) 


The quantity on the left side of (13) is called the estimated relative error in X (k> and its 
percentage form is called the estimated percentage error in X (k> . 


EXAMPLE 4 Estimated Relative Error 

For the computations in Example 3, find the smallest value of k for which the estimated 
percentage error in A ^ is less than 0.1%. 
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Solution The estimated percentage errors in the approximations in Example 3 are as 
follows: 


APPROXIMATION 


A®: 

A® - A® 


4.99361 -4.84615 

A® 


4.99361 

A®: 

A® - A® 


4.99974 - 4.99361 

A® 


4.99974 

A (4) : 

A® - A® 


4.99999 - 4.99974 

A® 


4.99999 

A®: 

A® - A® 


5.00000 - 4.99999 

A® 


5.00000 


RELATIVE PERCENTAGE 
ERROR ERROR 

0.02953 = 2.953% 

0.00123 = 0.123% 

0.00005 = 0.005% 

0.00000 = 0 % 


Thus, A® = 4.99999 is the first approximation whose estimated percentage error is less 
than 0.1%. -4 


Remark A rule for deciding when to stop an iterative process is called a stopping procedure. In 
the exercises, we will discuss stopping procedures for the power method that are based on the 
dominant eigenvector rather than the dominant eigenvalue. 


Exercise Set 9.2 


In Exercises 1-2, the distinct eigenvalues of a matrix are given. 
Determine whether A has a dominant eigenvalue, and if so, find it. 

1. (a) A, = 7, a 2 = 3, A 3 = -8, % = 1 

(b) A, = -5, A, = 3, A 3 = 2, A 4 = 5 

2. (a) A, = 1, A 2 = 0, A 3 = -3, A 4 = 2 
(b) A] = —3, A 2 = —2, A 3 = — 1, A 4 = 3 

In Exercises 4, apply the power method with Euclidean scal- 
ing to the matrix A, starting with x 0 and stopping at x 4 . Compare 
the resulting approximations to the exact values of the dominant 
eigenvalue and the corresponding unit eigenvector. 


7. Let 


A = 


2 -I' 

-1 2 


x 0 = 


3. A = 


' 5 -1' 

-1 -1 


xo = 


(a) Use the power method with maximum entry scaling to 
approximate a dominant eigenvector of A. Start with x 0 , 
round off all computations to three decimal places, and 
stop after three iterations. 

(b) Use the result in part (a) and the Rayleigh quotient to 
approximate the dominant eigenvalue of A. 

(c) Find the exact values of the eigenvector and eigenvalue 
approximated in parts (a) and (b). 

(d) Find the percentage error in the approximation of the dom- 
inant eigenvalue. 

8. Repeat the directions of Exercise 7 with 



7 

-2 

0 


l 







4. A = 

-2 

6 

-2 

; x 0 = 

0 


"2 

1 

o" 


T 


0 

-2 

5 


0 

A = 

1 

2 

0 

; x 0 = 

i 








0 

0 

10 


i 


In Exercises -6, apply the power method with maximum en- 
try scaling to the matrix A, starting with x 0 and stopping at x 4 . 
Compare the resulting approximations to the exact values of the 
dominant eigenvalue and the corresponding scaled eigenvector. 


5. A = 


1 -3' 

—3 5 

'3 2 2 

2 2 0 
2 0 4 


x 0 = 


Xo = 


In Exercises 9-10, a matrix A with a dominant eigenvalue and 
a sequence xo, Axo, .... A 5 xo are given. Use Formulas (9) and 
(10) to approximate the dominant eigenvalue and a correspond- 
ing eigenvector. 



T 2' 


T 


T 


"5" 

9. A = 

.2 1. 

; x 0 = 

0 . 

il 

o 

X 

_2 

, A 2 x 0 = 

_4_ 


A 3 x 0 = 


, A 4 x 0 = 


A 5 x 0 = 


121 ' 

122 


6. A = 
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'1 2' 


'O' 


'2' 


'4" 

II 

d 

2 1 

; x 0 = 

_ 1 _ 

II 

X 

_ 1 _ 

II 

O 

X 

N 

_5_ 


A 3 x 0 = 


A 4 x 0 = 


A 5 x 0 = 


122 ' 

121 


11. Consider matrices 
A = 


-1 O' 
0 0 


and xo = 


where x 0 is a unit vector and a 0. Show that even though 
the matrix A is symmetric and has a dominant eigenvalue, the 
power sequence (1) in Theorem 9.2.1 does not converge. This 
shows that the requirement in that theorem that the dominant 
eigenvalue be positive is essential. 


Working with Proofs 

15. Prove: If A is a nonzero n x n matrix, then A T A and A A 7 
have positive dominant eigenvalues. 

16. ( For readers familiar with proof by induction) Let A be an n x n 
matrix, let x 0 be a unit vector in R’\ and define the sequence 
x 1; x 2 , . . . , x*, .. . by 

Ax 0 Axi Ax^_i 

Xl = IIAxoll’ X2= l|Ax 1 ||""’ X * = II Ax, II’"' 
Prove by induction that x k = A*x 0 /||A*x 0 ||. 

17. (For readers familiar with proof by induction) Let A be an n x n 
matrix, let x 0 be a nonzero vector in R" , and define the sequence 
xi,x 2 , .. .,x*, . .. by 


12. Use the power method with Euclidean scaling to approximate 
the dominant eigenvalue and a corresponding eigenvector of 
A. Choose your own starting vector, and stop when the esti- 
mated percentage error in the eigenvalue approximation is less 
than 0.1%. 


(a) 


1 

3 

3 


3 

4 

-1 


3 

-1 

10 


(b) 


0 

2 

-1 

1 


1 

-1 

4 

1 


1 

1 

1 

8 


13. Repeat Exercise 12, but this time stop when all corresponding 
entries in two successive eigenvector approximations differ by 
less than 0.01 in absolute value. 


14. Repeat Exercise 12 using maximum entry scaling. 


Xl 


Ax 0 

, X 2 

max(Axo) 


Axi 

max(Ax!) ’ 


x* = 


Prove by induction that 


Xjt 


A k x o 

maxfA^Xo) 


Ax t _i 

maxfAx^i) ’ 


Working with Technology 

Tl. Use your technology utility to duplicate the computations in 
Example 2. 

T2. Use your technology utility to duplicate the computations in 
Example 3. 


9.3 Comparison of Procedures for Solving 
Linear Systems 

There is an old saying that “time is money.” This is especially true in industry where the 
cost of solving a linear system is generally determined by the time it takes for a computer 
to perform the required computations. This typically depends both on the speed of the 
computer processor and on the number of operations required by the algorithm. Thus, 
choosing the right algorithm has important financial implication in an industrial or 
research setting. In this section we will discuss some of the factors that affect the choice of 
algorithms for solving large-scale linear systems. 


Flops and the Cost of In computer jargon, an arithmetic operation (+, —,*,%) on two real numbers is called 
Solving a Linear System a flop, which is an acronym for “floating-point operation.” The total number of flops 

required to solve a problem, which is called the cost of the solution, provides a convenient 


Real numbers are stored in computers as numerical approximations called floating-point numbers. In base 
10, a floating-point number has the form ±.r/i J 2 • • • d n x 10 m , where m is an integer, called the mantissa , and 
n is the number of digits to the right of the decimal point. The value of n varies with the computer. In some 
literature the term flop is used as a measure of processing speed and stands for “floating-point operations per 
second In our usage it is interpreted as a counting unit. 


510 Numerical Methods 


It is now common in computer 
jargon to write “FLOPs” to 
mean the number of “flops per 
second.” However, we will 
write “flops” simply as the plu- 
ral of “flop.” When needed, 
we will write flops per second 
as flops/s. 


way of choosing between various algorithms for solving the problem. When needed, the 
cost in flops can be converted to units of time or money if the speed of the computer pro- 
cessor and the financial aspects of its operation are known. For example, today’s fastest 
computers are capable of performing in excess of 1 7 petaflops/s ( 1 petaflop = 1 0 1 5 flops) . 
Thus, an algorithm that costs 1,000,000 flops would be performed in 0.000000001 sec- 
ond. By contrast, today’s personal computers can perform in excess of 80 gigaflops/s 
(1 gigaflop = 10 9 flops). Thus, an algorithm that costs 1,000,000 flops would be per- 
formed on a personal computer in 0.0000125 second. 

To illustrate how costs (in flops) can be computed, let us count the number of flops 
required to solve a linear system of n equations in n unknowns by Gauss-Jordan elim- 
ination. For this purpose we will need the following formulas for the sum of the first n 
positive integers and the sum of the squares of the first n positive integers: 


1 + 2 + 3 H +n 


n(n + 1) 
2 


( 1 ) 


l 2 + 2 2 + 3 2 + • ■ 


• n 2 = 


n(n + 1)(2 n + 1) 


( 2 ) 


Let Ax = b be a linear system of n equations in n unknowns to be solved by Gauss- 
Jordan elimination (or, equivalently, by Gaussian elimination with back substitution). 
For simplicity, let us assume that A is invertible and that no row interchanges are re- 
quired to reduce the augmented matrix [A | b] to row echelon form. The diagrams that 
accompany the following analysis provide a convenient way of counting the operations 
required to introduce a leading 1 in the first row and then zeros below it. In our operation 
counts, we will lump divisions and multiplications together as “multiplications,” and we 
will lump additions and subtractions together as “additions.” 


Step 1. It requires n flops (multiplications) to introduce the leading 1 in the first row. 


1 

X 

X 

X 

X 

X 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 


x denotes a quantity that is being computed. 

• denotes a quantity that is not being computed. 
The augmented matrix size is n x (n + 1). 


Step 2. It requires n multiplications and n additions to introduce a zero below the leading 
1, and there are n — 1 rows below the leading 1, so the number of flops required 
to introduce zeros below the leading 1 is 2n(n — 1). 


"l 

• 

• 

• 

• 

• 

0 

X 

X 

X 

X 

X 

0 

X 

X 

X 

X 

X 

0 

X 

X 

X 

X 

X 

0 

X 

X 

X 

X 

X 


Column 1 . Combining Steps 1 and 2, the number of flops required for column 
1 is 


n + 2n{n — 1) = 2« 2 — n 
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Column 2. The procedure for column 2 is the same as for column 1, except that 
now we are dealing with one less row and one less column. Thus, the number of 
flops required to introduce the leading 1 in row 2 and the zeros below it can be 
obtained by replacing n by n — 1 in the flop count for the first column. Thus, 
the number of flops required for column 2 is 

2 {n - l) 2 - (n - 1) 

Column 3. By the argument for column 2, the number of flops required for 
column 3 is 

2 (n - 2) 2 - (n - 2) 

Total. The pattern should now be clear. The total number of flops required to 
create the n leading l’s and the associated zeros is 

(2 n 2 - n) + [2(n - l) 2 - (n - 1)] + [2 (n - 2) 2 - (n - 2)] + • • • + (2 - 1) 
which we can rewrite as 

2 [n 2 + (n - l) 2 + • • • + 1] - [n + (n - 1) + • • • + 1] 

or on applying Formulas ( 1 ) and (2) as 

n(n + l)(2n + 1) n(n + 1) 2 , 1 , 1 

2 = -n H — n n 

6 2 3 2 6 

Next, let us count the number of operations required to complete the back- 
ward phase (the back substitution). 

Column n. It requires n — 1 multiplications and n — I additions to introduce 
zeros above the leading 1 in the nth column, so the total number of flops required 
for the column is 2(n — 1). 


"1 

• 

• 

• 

0 

X 

0 

1 

• 

• 

0 

X 

0 

0 

1 • 

• 

0 

X 

0 

0 

0 • 

■ • 1 

0 

X 

0 

0 

0 • 

•• 0 

1 

• 


Column (« — 1). The procedure is the same as for Step 1 , except that now we are 
dealing with one less row. Thus, the number of flops required for the (n — l)st 
column is 2 (n — 2). 


”i 

• 

• 

0 

0 

X 

0 

1 

• 

0 

0 

X 

0 

0 

1 

0 

0 

X 

0 

0 

0 • 

• • 1 

0 

• 

0 

0 

0 • 

0 

1 

• 


Column (/i — 2). By the argument for column (n — 1), the number of flops 
required for column (n — 2) is 2 (n — 3). 
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Total. The pattern should now be clear. The total number of flops to complete 
the backward phase is 

2 (n — 1) + 2(n — 2) + 2(n — 3) + • • • + 2 (n — n) = 2 [n 2 — (1 + 2 + ■ ■ • + n)] 
which we can rewrite using Formula (1) as 


2 



n{n + 1) 
2 


— n 


In summary, we have shown that for Gauss-Jordan elimination the number of flops 
required for the forward and backward phases is 

flops for forward phase = |n 3 + \n 2 — jn (3) 

flops for backward phase = n 2 — n (4) 

Thus, the total cost of solving a linear system by Gauss-Jordan elimination is 

flops for both phases = |n 3 + \n 2 — |/i (5) 


Cost Estimates for Solving It is a property of polynomials that for large values of the independent variable the term 
Large Linear Systems of highest power makes the major contribution to the value of the polynomial. Thus, 
for large linear systems we can use (3) and (4) to approximate the number of flops in the 
forward and backward phases as 

flops for forward phase « |n 3 (6) 

flops for backward phase « n 2 (7) 

This shows that it is more costly to execute the forward phase than the backward phase 
for large linear systems. Indeed, the cost difference between the forward and backward 
phases can be enormous, as the next example shows. 

We leave it as an exercise for you to confirm the results in Table 1 . 

Table 1 


The cost in flops for Gaus- 
sian elimination is the same as 
that for the forward phase of 
Gauss-Jordan elimination. 


Approximate Cost for an n x n Matrix A with Large n 

Algorithm 

Cost in Flops 

Gauss-Jordan elimination (forward phase) 

3 n 

Gauss-Jordan elimination (backward phase) 

« n 2 

L (/-decomposition of A 


Forward substitution to solve Ly = b 

^n 2 

Backward substitution to solve Ux = y 

ren 2 

A~ l by reducing [A \ I] to [/ | A -1 ] 

« 2/t 3 

Compute A~*b 

ss 2/i 3 


► EXAMPLE 1 Cost of Solving a Large Linear System 

Approximate the time required to execute the forward and backward phases of Gauss- 
Jordan elimination for a system of one million ( = 10 6 ) equations in one million unknowns 
using a computer that can execute 10 petaflops per second (1 petaflop = 10 15 flops). 
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Solution We have n — 10 6 for the given system, so from (6) and (7) the number of 
petaflops required for the forward and backward phases is 

petaflops for forward phase ~ |« 3 x 10” 15 = |(10 6 ) 3 x 1CT 15 = | x 10 3 

petaflops for backward phase n 2 x 10 -15 = (10 6 ) 2 x 10 -15 = 10 -3 

Thus, at 10 petaflops/s the execution times for the forward and backward phases are 

time for forward phase « (f x 10 3 ) x 1CT 1 s ~ 66.67 s 

time for backward phase (10 -3 ) x 10 _1 s « 0.0001 s M 


Considerations in For a single linear system Ax — b of n equations in n unknowns, the methods of LU- 
Choosing an Algorithm for decomposition and Gauss-Jordan elimination differ in bookkeeping but otherwise in- 
Solving a Linear System volve the same number of flops. Thus, neither method has a cost advantage over the other. 

However, L [/-decomposition has the following advantages that make it the method of 
choice: 

Gauss-Jordan elimination and Gaussian elimination both use the augmented matrix 
[A | b], so b must be known. In contrast, L U -decomposition uses only the matrix A , 
so once that decomposition is known it can be used with as many right-hand sides as 
are required. 

The L [/-decomposition that is computed to solve Ax = b can be used to compute 
A -1 , if needed, with little additional work. 

For large linear systems in which computer memory is at a premium, one can dispense 
with the storage of the l’s and zeros that appear on or below the main diagonal of 
[/, since those entries are known from the form of U. The space that this opens up 
can then be used to store the entries of L , thereby reducing the amount of memory 
required to solve the system. 

If A is a large matrix consisting mostly of zeros, and if the nonzero entries are con- 
centrated in a “band” around the main diagonal, then there are techniques that can 
be used to reduce the cost of L [/-decomposition, giving it an advantage over Gauss- 
Jordan elimination. 


Exercise Set 9.3 

1. A certain computer can execute 10 gigaflops per second. Use 
Formula (5) to find the time required to solve the system using 
Gauss-Jordan elimination. 

(a) A system of 1000 equations in 1000 unknowns. 

(b) A system of 10,000 equations in 10,000 unknowns. 

(c) A system of 100,000 equations in 100,000 unknowns. 

2. A certain computer can execute 100 gigaflops per second. Use 
Formula (5) to find the time required to solve the system using 
Gauss-Jordan elimination. 

(a) A system of 10,000 equations in 10,000 unknowns. 

(b) A system of 100,000 equations in 100,000 unknowns. 

(c) A system of 1,000,000 equations in 1,000,000 unknowns. 

3. A certain computer can execute 70 gigaflops per second. Use 
Table 1 to estimate the time required to perform the following 
operations on the invertible 10,000 x 10,000 matrix A. 


(a) Execute the forward phase of Gauss-Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimina- 
tion. 

(c) L [/-decomposition of A. 

(d) Find A- 1 by reducing [A | I] to [/ | A- 1 ]- 

4. The IBM Sequoia computer can operate at speeds in excess 
of 16 petaflops per second (1 petaflop = 10 15 flops). Use Ta- 
ble 1 to estimate the time required to perform the following 
operations on an invertible 100,000 x 100,000 matrix A. 

(a) Execute the forward phase of Gauss-Jordan elimination. 

(b) Execute the backward phase of Gauss-Jordan elimina- 
tion. 

(c) L (/-decomposition of A. 

(d) Find A -1 by reducing [A | I] to [/ | A -1 ]. 
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5. (a) Approximate the time required to execute the forward 

phase of Gauss-Jordan elimination for a system of 
100,000 equations in 100,000 unknowns using a com- 
puter that can execute 1 gigaflop per second. Do the same 
for the backward phase. (See Table 1.) 

(b) How many gigaflops per second must a computer be able 
to execute to find the LU -decomposition of a matrix of 
size 10,000 x 10,000 in less than 0.5 s? (See Table 1.) 

6. About how many teraflops per second must a computer 
be able to execute to find the inverse of a matrix of size 
100,000 x 100,000 in less than 0.5 s? (1 teraflop= 10 12 flops.) 


In Exercises 10, A and B are n x n matrices and c is a real 
number. 

7. How many flops are required to compute cA? 

8. How many flops are required to compute A + S? 

9. How many flops are required to compute AS? 

10. If A is a diagonal matrix and k is a positive integer, how many 
flops are required to compute A kr ! 


9.4 Singular Value Decomposition 

In this section we will discuss an extension of the diagonalization theory for n x n 
symmetric matrices to general m x n matrices. The results that we will develop in this 
section have applications to compression, storage, and transmission of digitized 
information and form the basis for many of the best computational algorithms that are 
currently available for solving linear systems. 


Decompositions of Square We saw in Formula (2) of Section 7.2 that every symmetric matrix A can be expressed as 

^ _ pp)pT ( 1 ) 

where P is an n x n orthogonal matrix of eigenvectors of A, and D is the diagonal 
matrix whose diagonal entries are the eigenvalues corresponding to the column vectors 
of P . In this section we will call (1 ) an eigenvalue decomposition of A (abbreviated EVD 
of A). 

If an n x n matrix A is not symmetric, then it does not have an eigenvalue decom- 
position, but it does have a Hessenberg decomposition 

A = PHP t 

in which P is an orthogonal matrix and H is in upper Hessenberg form (Theorem 7.2.4). 
Moreover, if A has real eigenvalues, then it has a Schur decomposition 

A = PSP T 

in which P is an orthogonal matrix and S is upper triangular (Theorem 7.2.3). 

The eigenvalue, Hessenberg, and Schur decompositions are important in numerical 
algorithms not only because the matrices D, H, and S have simpler forms than A, but 
also because the orthogonal matrices that appear in these factorizations do not magnify 
roundoff error. To see why this is so, suppose that x is a column vector whose entries are 
known exactly and that 

x = x + e 

is the vector that results when roundoff error is present in the entries of x. If P is an 
orthogonal matrix, then the length-preserving property of orthogonal transformations 
implies that 

|| Px — Px|| = ||x — x|| = ||e|| 

which tells us that the error in approximating Px by Px has the same magnitude as the 
error in approximating x by x. 

There are two main paths that one might follow in looking for other kinds of de- 
compositions of a general square matrix A: One might look for decompositions of the 
form 


A = PJP 
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in which P is invertible but not necessarily orthogonal, or one might look for decompo- 
sitions of the form 

A = V£V t 

in which U and V are orthogonal but not necessarily the same. The first path leads to 
decompositions in which J is either diagonal or a certain kind of block diagonal matrix, 
called a Jordan canonical form in honor of the French mathematician Camille Jordan (see 
p. 518). Jordan canonical forms, which we will not consider in this text, are important 
theoretically and in certain applications, but they are of lesser importance numerically 
because of the roundoff difficulties that result from the lack of orthogonality in P . In 
this section we will focus on the second path. 

Singular Values Since matrix products of the form A T A will play an important role in our work, we will 
begin with two basic theorems about them. 


[EM 9.4.1 If A is an m x n matrix , then : 

(a) A and A T A have the same null space. 

(b) A and A T A have the same row space. 

(c) A t and A T A have the same column space. 

(i d ) A and A T A have the same rank. 


We will prove part (a) and leave the remaining proofs for the exercises. 

Proof (a) We must show that every solution of Ax = 0 is a solution of A T Ax = 0, and 
conversely. If xo is any solution of Ax = 0, then xo is also a solution of A r Ax = 0 since 

A r Ax 0 = A r (Ax 0 ) = A r 0 = 0 

Conversely, if x 0 is any solution of A r Ax = 0, then x 0 is in the null space of A T A and 
hence is orthogonal to all vectors in the row space of A T A by part ( q ) of Theorem 4.8.8. 
However, A T A is symmetric, so x 0 is also orthogonal to every vector in the column space 
of A t A. In particular, x 0 must be orthogonal to the vector (A r A)x 0 ; that is, 

x 0 • (A r A)x 0 = 0 

Using the first formula in Table 1 of Section 3.2 and properties of the transpose operation 
we can rewrite this as 

Xq (A r A)x 0 = (Axo) r (Ax 0 ) = (Ax 0 ) • (Ax 0 ) = ||Ax 0 || 2 = 0 
which implies that Ax 0 = 0 , thereby proving that x 0 is a solution of Ax 0 = 0 . 


THEOREM 9.4.2 If A is an m x n matrix , then'. 

(a) A t A is orthogonally diagonalizable. 

(b) The eigenvalues of A T A are nonnegative. 

Proof (a) The matrix A r A, being symmetric, is orthogonally diagonalizable by Theorem 

7.2.1. 

Proof (b) Since A r A is orthogonally diagonalizable, there is an orthonormal basis for 
R n consisting of eigenvectors of A T A, say {v 1; v 2 , . . . , v„}. If we let A. 1; X 2 , . . . , X„ be the 
corresponding eigenvalues, then for 1 < i < n we have 

|| Av,- 1| 2 = Av,- • Av,- — V; • A \\ \ ; [Formula (26) of Section 3.2] 

= V,- • A./V/ = Xj (v,- • V/) = Xj II V/ II 2 = Xi 
It follows from this relationship that A.; > 0. 
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We will assume throughout 
this section that the eigenval- 
ues of A T A are named so that 

A.i > X 2 > ■ ■ ■ > K > 0 
and hence that 


DEFINITION 1 If A is an m x n matrix, and if A.i, X 2 , . . . , are the eigenvalues of 
A t A, then the numbers 


of 


\fk~u a 2 = y/Xj, • • • , 


s/Xn 


are called the singular values of A. 


o\ > a 2 > • • • > <t„ > 0 


► EXAMPLE 1 Singular Values 

Find the singular values of the matrix 


"1 r 
0 1 
.1 o_ 

Solution The first step is to find the eigenvalues of the matrix 




"1 

r 



'1 0 

f 

0 

l 

= 

'2 r 

1 1 

0 _ 


o_ 


1 2 



_1 




The characteristic polynomial of A T A is 

A 2 -4A + 3= (A.-3)(A.- 1) 

so the eigenvalues of A T A are Xi = 3 and X 2 = 1 and the singular values of A in order 
of decreasing size are 

°i = sfX\ = V3, (J 2 = -y/^2 = 1 


Singular Value 
Decomposition 



x 

X 

X 

X 


X 

X 

X 

X 


X 

X 

X 

X 



X X X X 
X X X X 
X X X X 


Main diagonal 


▲ Figure 9.4.1 


Before turning to the main result in this section, we will find it useful to extend the notion 
of a “main diagonal” to matrices that are not square. We define the main diagonal of an 
m x n matrix to be the line of entries shown in Figure 9.4.1 — it starts at the upper left 
corner and extends diagonally as far as it can go. We will refer to the entries on the main 
diagonal as the diagonal entries. 

We are now ready to consider the main result in this section, which is concerned 
with a specific way of factoring a general m x n matrix A. This factorization, called 
singular value decomposition (abbreviated SVD) will be given in two forms, a brief form 
that captures the main idea, and an expanded form that spells out the details. The proof 
is given at the end of this section. 


Singular Value Decomposition (Brief Form) 

If A is an m x n matrix of rank k, then A can be expressed in the form A = ITEV 7 , 
where Z has size m x n and can be expressed in partitioned form as 


D 

Okx(n—k ) 

0(m—k)xk 

0(m—k)x(n—k)_ 



in which D is a diagonal k x k matrix whose successive entries are the first k singular 
values of A in nonincreasing order, U is an m x n orthogonal matrix, and V is an n x n 
orthogonal matrix. 
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Harry Bateman 
(1882-1946) 


Historical Note The term singu- 
lar value is apparently due to the 
British-born mathematician Harry 
Bateman, who used it in a research 
paper published in 1908. Bateman 
emigrated to the United States in 
1910, teaching at Bryn Mawr Col- 
lege, Johns Hopkins University, 
and finally at the California Insti- 
tute of Technology. Interestingly, 
he was awarded his Ph.D. in 1913 
by Johns Hopkins at which point 
in time he was already an emi- 
nent mathematician with 60 pub- 
lications to his name. 

[Image: Courtesy of the Archives, 
California Institute ofTechnology] 


The vectors U|, u 2 , . . . , are 
called the left singular vec- 
tors of A, and the vectors 

Vi,v 2 Vi are called the 

right singular vectors of A . 


SingularValue Decomposition (Expanded Form) 

If A is an m x n matrix of rank k , then A can be factored as 


A = t/EV r = [u 1 u 2 


lit | u* +1 


O'! 

0 ••• 

0 


0 

02 ■■■ 

0 

0kx(n-k) 

0 

0 ••• 

Ok 


0(m-k)xk 

b(m—k)x(n—k) 


in which U , E , and V have sizes m x m, m x n, and n x n, respectively, and in which'. 

(a) V = [vi V 2 v„] orthogonally diagonalizes A 1 A. 

(b) The nonzero diagonal entries of E are er i = */k[, er 2 = VAT, . . . , = a/Xa, 

where A .j, A 2 , . . . , Aa are the nonzero eigenvalues of A T A corresponding to the 
column vectors ofV. 

(c) The column vectors ofV are ordered so that o\ > cr 2 > ■ ■ ■ > o> > 0. 

Av,- 1 

(d) u ; = — — - = — Av ; 0 = 1,2,...,*) 

11-AViH (Ti 

(e) {ui, u 2 , . . . , ua} is an orthonormal basis for col(A). 

(/) {ui, u 2 , . . . , u k, Ua+i, . . . , u,„} is an extension of {ui, u 2 , . . . , ua} to an ortho- 
normal basis for R m . 


It-' EXAMPLE 2 SingularValue Decomposition if A Is Not Square 

Find a singular value decomposition of the matrix 


A = 


"1 

0 

1 


r 

l 

o 


Solution We showed in Example 1 that the eigenvalues of A 7 A are Ai = 3 and A 2 = 1 
and that the corresponding singular values of A are o\ = V3 and er 2 = 1. We leave it 
for you to verify that 


sfi~ 


V2 

2 

and v 2 = 

2 

V2 

V2 

2 


2 


are eigenvectors corresponding to Ai and A 2 , respectively, and that V = [vi | v 2 ] orthog- 
onally diagonalizes A r A. From part (d) of Theorem 9.4.4, the vectors 


1 V3 

ui = — Avi = — 

"1 

0 

r 

l 


i 

<j i 3 

_1 

o_ 


V2 

2 

1 

"1 

r 


V2 

2 

u 2 = — Av 2 = (1) 

<t 2 

0 

.1 

l 

o_ 


V2 

2 


VI 

3 

VI 

6 

VI 

6 

" 0 " 

2 

sll 

2-1 
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are two of the three column vectors of U. Note that U[ and u 2 are orthonormal, as 
expected. We could extend the set {ui , u 2 } to an orthonormal basis for R' . However, the 
computations will be easier if we first remove the messy radicals by multiplying ui and 
u 2 by appropriate scalars. Thus, we will look for a unit vector U3 that is orthogonal to 


V6ui = 

~ 2 ~ 

1 

and V2u 2 = 

" 0“ 

-1 


1 


1 


To satisfy these two orthogonality conditions, the vector u 3 must be a solution of the 
homogeneous linear system 



~xf 





'O' 


x 2 

— 

_ 0 _ 


_X3_ 




We leave it for you to show that a general solution of this system is 


"*f 


■-r 

t 2 

= t 

1 

_*3_ 


i_ 


Normalizing the vector on the right yields 


u 3 = 


1 

•s/3 

1 

s/3 

1 


s/3- 1 



Eugenio Beltrami Camille Jordan 

(1835-1900) (1838-1922) 




Herman Klaus Weyl Gene H. Golub 

(1885-1955) (1932-2007) 


Historical Note The theory of singular value decompo- 
sitions can be traced back to the work of five people: the 
Italian mathematician Eugenio Beltrami, the French math- 
ematician Camille Jordan, the English mathematician 
James Sylvester (see p. 35), and the German mathemati- 
cians Erhard Schmidt (see p. 371) and the mathematician 
Herman Weyl. More recently, the pioneering efforts of 
the American mathematician Gene Golub produced a 
stable and efficient algorithm for computing it. Beltrami 
and Jordan were the progenitors of the decomposition — 
Beltrami gave a proof of the result for real, invertible 
matrices with distinct singular values in 1873. Subse- 
quently, Jordan refined the theory and eliminated the 
unnecessary restrictions imposed by Beltrami. Sylvester, 
apparently unfamiliar with the work of Beltrami and 
Jordan, rediscovered the result in 1889 and suggested its 
importance. Schmidt was the first person to show that the 
singular value decomposition could be used to approxi- 
mate a matrix by another matrix with lower rank, and, in 
so doing, he transformed it from a mathematical curiosity 
to an important practical tool. Weyl showed how to find 
the lower rank approximations in the presence of error. 
[Images: http://www-history.mcs.st-andrews.ac.uk/ 
history/PictDisplay/Beltrami.html ( Beltrami): The 
Granger Collection, New York (Jordan); Courtesy 
Electronic Publishing Services, Inc., New York City 
(Weyl); Courtesy of Hector Garcia-Molina (Golub)] 
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OPTIONAL 


Thus, the singular value decomposition of A is 





r V6 

0 

1 “ 





"1 

r 


3 

V3 


V3 

0 


" V2 

V2 

0 

1 

= 

V6 

6 

V2 

2 

1 

V3 


0 

1 


2 

V2 

2 

V2 

_1 

o_ 


V6 

■s/2 

1 


0 

0 


_ 2 

2 




L 6 

2 

•v/3 J 






A = U £ V T 


You may want to confirm the validity of this equation by multiplying out the matrices 
on the right side. 


We conclude this section with an optional proof of Theorem 9.4.4. 

Proof of Theorem 9.4.4 For notational simplicity we will prove this theorem in the case 
where A is an n x n matrix. To modify the argument for an m x n matrix you need only 
make the notational adjustments required to account for the possibility that m > n or 
n > m. 

The matrix A T A is symmetric, so it has an eigenvalue decomposition 

A t A = VDV t 


in which the column vectors of 


V = [vi | v 2 | • ■ ■ | v„] 

are unit eigenvectors of A T A, and D is a diagonal matrix whose successive diagonal 
entries Ai, A 2 , . . . , A„ are the eigenvalues of A T A corresponding in succession to the 
column vectors of V. Since A is assumed to have rank k , it follows from Theorem 9.4. 1 
that A t A also has rank k. It follows as well that D has rank k, since it is similar to A T A 
and rank is a similarity invariant. Thus, D can be expressed in the form 

A.J 0 

A 2 


D = 


A k 


0 


0 0 

where Ai > A 2 > • • • > A* > 0. Now let us consider the set of image vectors 

{Ay 1 , Ay 2 , . . . , Av„) 


( 2 ) 


( 3 ) 


This is an orthogonal set, for if i ^ j, then the orthogonality of v,- and Vy implies that 

Av ; • Avy = v ; • A T Ayj = v ; • A^-v,- = A • Vy) = 0 

Moreover, the first k vectors in (3) are nonzero since we showed in the proof of Theo- 
rem 9.4.2(A) that || Av; || 2 = A,- for i = 1,2,..., n, and we have assumed that the first k 
diagonal entries in (2) are positive. Thus, 

S = {Av!, Av 2 ,-.-, Ay k ] 


is an orthogonal set of nonzero vectors in the column space of A. But the column space 
of A has dimension k since 


rank(A) = rank(A r A) = k 
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and hence S, being a linearly independent set of k vectors, must be an orthogonal basis 
for col(A). If we now normalize the vectors in S, we will obtain an orthonormal basis 
{ui , U 2 , . . . , u*} for col(A) in which 

A\j 1 

u, = — — - = (1 <i <k) 

II i || A,/ 

or, equivalently, in which 

Avi = VMui = rrmi, Av 2 = V^ 2 U 2 = cr 2 u 2 , A\ k = y/Y k u k — cr k u k (4) 

It follows from Theorem 6.3.6 that we can extend this to an orthonormal basis 


{ui, u 2 , . . . , u k , u*+i, .... u„} 
for R n . Now let U be the orthogonal matrix 

U — [ui u 2 ••• u* u^+i ••• u„] 

and let E be the diagonal matrix 

(T 


E = 




ct 2 


o k 


0 


o 


It follows from (4), and the fact that Av,- = 0 for i > k, that 

I/E = [criUi er 2 u 2 ••• <J k u k 0 ••• 0] 

= [Avi Av 2 ••• A\ k A \ k+ 1 ••• Av„] 

= AV 

which we can rewrite using the orthogonality of V as A = LTZV t . 


Exercise Set 9.4 

In Exercises 1-4, find the distinct singular values of A . 
1. A = [l 2 0] 


In Exercises '-12, find a singular value decomposition of A. 



T -f 


'-3 

O' 

5. A = 

1 1 

II 

so 

0 

— 4_ 


3. A = 


1 - 2 ' 
2 1 


2. A = 


4. A = 


'3 O' 

0 4 

\fl 0 ' 

i Vi 




'-2 


2' 




9. 

A = 

-1 

1 

II 

't 

o 

— Z 








2 




2 


2 






1 

o' 



'6 

4' 

11. 

A = 

1 

1 


II 

rl 

0 

0 



-1 

1 



4 

0 


Working with Proofs 

13. Prove: If A is an m x n matrix, then A T A and AA 7 have the 
same rank. 


'4 6" 

0 4 


'3 3" 

3 3 


14. Prove part (d ) of Theorem 9.4.1 by using part («) of the the- 
orem and the fact that A and A T A have n columns. 


7. A = 


8. A = 
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15 . (a) Prove part ( b ) of Theorem 9.4.1 by first showing that 

rowjAA) is a subspace of row(A). 

(b) Prove part ( c ) of Theorem 9.4. 1 by using part (b). 

16 . Let T : R" -> R m be a linear transformation whose standard 
matrix A has the singular value decomposition A = UY V T , 
and let B = {vi, v 2 , . . . , v„} and B' s= {ui, u 2 , . . . , u m } be 
the column vectors of V and U, respectively. Prove that 

E = ITh’.s- 

17 . Prove that the singular values of A T A are the squares of the 
singular values of A . 

18 . Prove that if A = UYV T is a singular value decomposition of 
A, then U orthogonally diagonalizes AA T . 

19 . A polar decomposition of an n x n matrix A is a factorization 
A = PQ in which P is a positive semidefinite n x n matrix 
with the same rank as A, and Q is an orthogonal n x n ma- 
trix. 

(a) Prove that if A = UYV r is the singular value decompo- 
sition of A, then A = (UYU T )(UV T ) is a polar decom- 
position of A. 

(b) Find a polar decomposition of the matrix in Exercise 5. 

True-False Exercises 

TF. In parts (a)-(g) determine whether the statement is true or 
false, and justify your answer. 


(a) If A is an m x n matrix, then A T A is an m x m matrix. 

(b) If A is an m x n matrix, then A T A is a symmetric matrix. 

(c) If A is an m x n matrix, then the eigenvalues of A T A are posi- 
tive real numbers. 

(d) If A is an n x n matrix, then A is orthogonally diagonalizable. 

(e) If A is an m x n matrix, then A T A is orthogonally diagonaliz- 
able. 

(f ) The eigenvalues of A T A are the singular values of A. 

(g) Every m x n matrix has a singular value decomposition. 

Working with Technology 

Tl. Use your technology utility to duplicate the computations in 
Example 2. 

T2. For the given matrix A, use the steps in Example 2 to 
find matrices U , S, and V T in a singular value decomposition 
A = UYV T . 
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-2 -1 2 




2 1 -2 

(b) A = 

1 

1 

L - 1 


-1 

1 _ 


9.5 Data Compression Using Singular 
Value Decomposition 

Efficient transmission and storage of large quantities of digital data has become a major 
problem in our technological world. In this section we will discuss the role that singular 
value decomposition plays in compressing digital data so that it can be transmitted more 
rapidly and stored in less space. We assume here that you have read Section 9.4. 


Reduced Singular Value Algebraically, the zero rows and columns of the matrix E in Theorem 9.4.4 are su- 
Decomposition perfluous and can be eliminated by multiplying out the expression U Y V 1 using block 
multiplication and the partitioning shown in that formula. The products that involve 
zero blocks as factors drop out, leaving 
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which is called a reduced singular value decomposition of A . In this text we will denote 
the matrices on the right side of (1) by U\, E 1; and Vj 7 , respectively, and we will write 
this equation as 

A = 


( 2 ) 
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Note that the sizes of U\, E 1; and Vf are m x k, k x k, and k x n, respectively, and 
that the matrix Si is invertible since its diagonal entries are positive. 

If we multiply out on the right side of (1) using the column-row rule, then we obtain 

A = + ct 2 u 2 v[ H f a k n k y T k (3) 

which is called a reduced singular value expansion of A . This result applies to all ma- 
trices, whereas the spectral decomposition [Formula (7) of Section 7.2] applies only to 
symmetric matrices. 

Remark It can be proved that an m x n matrix M has rank 1 if and only if it can be factored 
as M = uv r , where u is a column vector in R m and V is a column vector in R" . Thus, a reduced 
singular value decomposition expresses a matrix A of rank k as a linear combination of k rank 1 
matrices. 


► EXAMPLE 1 Reduced Singular Value Decomposition 


Find a reduced singular value decomposition and a reduced singular value expansion of 
the matrix 


A = 


T 

0 

1 


r 

i 

o 


Solution In Example 2 of Section 9.4 we found the singular value decomposition 




"" a/6 

0 

1 





i r 


3 

Vs 


V3 

0 


” a/2 

■J2 

0 1 

= 

a/6 

6 

vT 

2 

1 

V3 


0 

1 


2 

V2 

2 

V2 

i o. 


a/6 

yfl 

1 


0 

0 


2 

2 



6 

2 

Vs 





A 

= 


u 


E 



V T 


Since A has rank 2 (verify), it follows from ( 1) with k — 2 that the reduced singular value 
decomposition of A corresponding to (4) is 


_ i r 

0 1 

1 o 


>/6 

3 

■v/6 

6 

a/6 

6 


"a/3 

O' 

V2 

2 

a/2 

2 

0 

1_ 

V~2 

a/2 



2 

2 _ 


This yields the reduced singular value expansion 
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Note that the matrices in the expansion have rank 1, as expected. 
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Data Compression and 
Image Processing 


Singular value decompositions can be used to “compress” visual information for the 
purpose of reducing its required storage space and speeding up its electronic transmis- 
sion. The first step in compressing a visual image is to represent it as a numerical matrix 
from which the visual image can be recovered when needed. 

For example, a black and white photograph might be scanned as a rectangular array 
of pixels (points) and then stored as a matrix A by assigning each pixel a numerical 
value in accordance with its gray level. If 256 different gray levels are used (0 = white to 
255 = black), then the entries in the matrix would be integers between 0 and 255. The 
image can be recovered from the matrix A by printing or displaying the pixels with their 
assigned gray levels. 

If the matrix A has size m x n, then one might store each of its mn entries individually. 
An alternative procedure is to compute the reduced singular value decomposition 

A = equjvf + cr 2 u 2 vf H b o k \x k y T k (5) 

in which oq > er 2 > ■ ■ • > cr k , and store the it’s, the u’s, and the v’s. When needed, the 
matrix A (and hence the image it represents) can be reconstructed from (5). Since each 
u ; has m entries and each has n entries, this method requires storage space for 

km + kn + k = k(m + n + 1) 

numbers. Suppose, however, that the singular values cr,. +1 , . . . , o k are sufficiently small 
that dropping the corresponding terms in (5) produces an acceptable approximation 

A,. = + er 2 u 2 v 2 H b a,.u r v^ (6) 

to A and the image that it represents. We call (6) the rank r approximation of A. This 
matrix requires storage space for only 

rm + rn + r = r(m + n + 1) 

numbers, compared to mn numbers required for entry-by-entry storage of A. For exam- 
ple, the rank 100 approximation of a 1000 x 1000 matrix A requires storage for only 

100(1000 + 1000 + 1 ) = 200, 100 

numbers, compared to the 1,000,000 numbers required for entry-by-entry storage of 
A — a compression of almost 80%. 

Figure 9.5.1 shows some approximations of a digitized mandrill image obtained 
using (6). 



Original 



Reconstruction 


In 1924 the U.S. Federal Bureau of Inves- 
tigation (FBI) began collecting fingerprints and handprints 
and now has more than 100 million such prints in its files. 
To reduce the storage cost, the FBI began working with the 
Los Alamos National Laboratory, the National Bureau of 
Standards, and other groups in 1993 to devise rank-based 
compression methods for storing prints in digital form. The 
following figure shows an original fingerprint and a recon- 
struction from digital data that was compressed at a ratio of 
26:1. 
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Rank 4 Rank 10 Rank 20 Rank 50 Rank 128 

A Figure ’ [Image: Digital Vision! Age Fotostock America, Inc.] 


Exercise Set 9.5 


In Exercises 1-4, find a reduced singular value decomposition 
of A. [Note: Each matrix appears in Exercise Set 9.4, where you 
were asked to find its (unreduced) singular value decomposition.] 


1. A = 


-2 

-1 

2 


2 

1 

-2 


2. A = 


-2 

2 


-1 

1 


2 ' 

-2 


3. 



0 

1 

1 


4. A = 


4 

0 

0 


In Exercises 5-8, find a reduced singular value expansion of A. 


5. The matrix A in Exercise 1 . 

6. The matrix A in Exercise 2. 


7. The matrix A in Exercise 3. 

8 . The matrix A in Exercise 4. 

9. Suppose A is a 200 x 500 matrix. How many numbers must be 
stored in the rank 100 approximation of A? Compare this with 
the number of entries of A. 

True-False Exercises 

TF. In parts (a)-(c) determine whether the statement is true or 
false, and justify your answer. Assume that U\ Si Vj r is a reduced 
singular value decomposition of an m x n matrix of rank k. 

(a) Ui has size m x k. 

(b) Ei has size k x k. 

(c) Vj has size k x n. 
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1. Find an L [/-decomposition of A = 


-6 

6 


2 " 

0 


2. Find the LD [/-decomposition of the matrix A in Exercise 1. 


(c) Apply the power method with maximum entry scaling to 
A and xo, stopping at X 5 . Compare your result with the 

-\W 


eigenvector 



'2 4 6 ' 

6. Consider the symmetric matrix 

3. Find an L [/-decomposition of A = 

1 4 7 

1 3 7 

A = 

'0 r 
_! 0 . 


4. Find the LD [/-decomposition of the matrix A in Exercise 3. Discuss the behavior of the power sequence 


5. Let A = 


'2 

1 


1 ' 

2 


and x 0 = 


T 

0 


(a) Identify the dominant eigenvalue of A and then find the 
corresponding dominant unit eigenvector v with positive 
entries. 


(b) Apply the power method with Euclidean scaling to A and 
Xo, stopping at x 5 . Compare your value of x 5 to the eigen- 
vector v found in part (a). 


X 0 , X! X* 

with Euclidean scaling for a general nonzero vector x 0 . What 
is it about the matrix that causes the observed behavior? 

7. Suppose that a symmetric matrix A has distinct eigenvalues 
A.i = 8 , X 2 = 1.4, L 3 = 2.3, and/Lt — —8.1. What can you say 
about the convergence of the Rayleigh quotients? 

fl 1" 

8 . Find a singular value decomposition of A = 
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11 12. Do orthogonally similar matrices have the same singular val- 

9. Find a singular value decomposition of A = 0 0 . ues? Justify your answer. 

13. If jP is the standard matrix for the orthogonal projection of 

10. Find a reduced singular value decomposition and a reduced R n onto a subspace W, what can you say about the singular 

singular value expansion of the matrix A in Exercise 9. values of PI 

11. Find the reduced singular value decomposition of the matrix 14. Prove: If A has rank 1, then there exists a scalar k such that 

whose singular value decomposition is A 2 = IcA. 

Ill 
2 2 2 

1 _I _ I 

2 2 2 

1 _I I 

2 2 2 

1 I _I 

2 2 2 
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INTRODUCTION This chapter consists of 20 applications of linear algebra . With one clearly marked 

exception, each application is in its own independent section, so sections can be deleted 
or permuted as desired. Each topic begins with a list of linear algebra prerequisites. 

Because our primary objective in this chapter is to present applications of linear 
algebra, proofs are often omitted. Whenever results from other fields are needed, they 
are stated precisely, with motivation where possible, but usually without proof. 
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10.1 


A Line Through Two Points 



A Figure 10.1.1 


Constructing Curves and SurfacesThrough 
Specified Points 

In this section we describe a technique that uses determinants to construct lines, circles, and 
general conic sections through specified points in the plane. The procedure is also used to 
pass planes and spheres in 3-space through fixed points. 

Linear Systems 
Determinants 
Analytic Geometry 


The following theorem follows from Theorem 2.3.8. 


A homogeneous linear system with as many equations as unknowns 
has a nontrivial solution if and only if the determinant of the coefficient matrix is zero. 


We will now show how this result can be used to determine equations of various curves 
and surfaces through specified points. 


Suppose that (xi, y\) and (x 2 , y 2 ) are two distinct points in the plane. There exists a 
unique line 

c ix + c 2 y + c 3 = 0 (1) 

that passes through these two points (Figure 10.1.1). Note that ci, c 2 , and c 3 are not all 
zero and that these coefficients are unique only up to a multiplicative constant. Because 
(x\ , yi) and (x 2 , y 2 ) he on the line, substituting them in (1) gives the two equations 

C\X\ + c 2 yi + c 3 = 0 (2) 

c\x 2 + c 2 yi + c 3 = 0 (3) 

The three equations, (1), (2), and (3), can be grouped together and rewritten as 


xc\ + yc 2 + c 2 = 0 
x\C\ + yic 2 + c 3 = 0 
x 2 ci + y 2 c 2 + c 3 = 0 


which is a homogeneous linear system of three equations for c\,c 2 , and c 3 . Because c\, 
c 2 , and c 3 are not all zero, this system has a nontrivial solution, so the determinant of 
the coefficient matrix of the system must be zero. That is, 


X 

y 

Xi 

yi 

-*2 

yi 


= 0 


(4) 


Consequently, every point (x, y) on the line satisfies (4); conversely, it can be shown that 
every point (x, y) that satisfies (4) lies on the line. 
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A Circle Through Three 
Points 



▲ Figure 10 . 1.2 


► EXAMPLE 1 Equation of a Line 

Find the equation of the line that passes through the two points (2, 1) and (3, 7). 
Solution Substituting the coordinates of the two points into Equation (4) gives 


x y 
2 1 
3 7 


1 

1 

1 


= 0 


The cofactor expansion of this determinant along the first row then gives 

— 6x + 3/ + 11 = 0 


Suppose that there are three distinct points in the plane, (xi, yi), (X2, >’2), and (X3, 3*3), 
not all lying on a straight line. From analytic geometry we know that there is a unique 
circle, say, 

ci (x 2 + y 2 ) + c 2 x + c 3 y + c 4 = 0 (5) 

that passes through them (Figure 10. 1 .2). Substituting the coordinates of the three points 
into this equation gives 

ci(xf + y 2 ) + C2X1 + C3V1 + C4 — 0 (6) 

ci(xf + yf) + c 2*2 + c 3 y 2 + C4 = 0 ( 7 ) 

CiCxj + V3) + C2X3 + C3V3 + C4 = 0 (8) 


As before, Equations (5) through (8) form a homogeneous linear system with a nontrivial 
solution for Ci, c 2 , c 3 , and c 4 . Thus the determinant of the coefficient matrix is zero: 


X 2 + y 2 

X 

y 

1 

x 2 + y 2 

Xl 

y\ 

1 

xj + y\ 

*2 

yi 

1 

xf + y\ 

X 3 

T3 

1 


This is a determinant form for the equation of the circle. 


► EXAMPLE 2 Equation of a Circle 

Find the equation of the circle that passes through the three points (1,7), (6, 2), and 
(4, 6). 

Solution Substituting the coordinates of the three points into Equation (9) gives 

x 2 + y 2 x y 1 
50 17 1 

= 0 

40 6 2 1 

52 4 6 1 

which reduces to 

10(x 2 + y 2 ) - 20x - 40y - 200 = 0 

In standard form this is 

(x - l) 2 + (y - 2) 2 = 5 2 
Thus the circle has center (1,2) and radius 5. 


530 Chapter 10 Applications of Linear Algebra 


A General Conic Section In his momumental work Principia Mathematica, Issac Newton posed and solved the 
Through Five Points following problem (Book I, Proposition 22, Problem 14): “To describe a conic that shall 
pass through five given points.” Newton solved this problem geometrically, as shown in 
Figure 10.1.3, in which he passed an ellipse through the points A, B, D, P, C; however, 
the methods of this section can also be applied. 


C 




The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, 
or degenerate forms of these curves) is given by 

c ix 2 + c 2 xy + c 3 y 2 + c^x + c 5 y + C(, — 0 


This equation contains six coefficients, but we can reduce the number to five if we di- 
vide through by any one of them that is not zero. Thus only five coefficients must be 
determined, so five distinct points in the plane are sufficient to determine the equation 
of the conic section (Figure 10.1.4). As before, the equation can be put in determinant 
form (see Exercise 7): 


x 2 

xy 

r 

X 

y 

l 

A 

xiyi 

yf 

Xi 

y i 

l 

x\ 

x 2 y 2 

y\ 

X 2 

V2 

l 

x\ 

x 3 y 3 

y\ 

x 3 

y 3 

l 

x\ 

X4 34 

y\ 

X4 

y4 
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A 

Jf5.V5 

y\ 

X 5 

ys 
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EXAMPLE 3 Equation of an Orbit 

An astronomer who wants to determine the orbit of an asteroid about the Sun sets 
up a Cartesian coordinate system in the plane of the orbit with the Sun at the origin. 
Astronomical units of measurement are used along the axes (1 astronomical unit = mean 
distance of Earth to Sun = 93 million miles). By Kepler’s first law, the orbit must be an 
ellipse, so the astronomer makes five observations of the asteroid at five different times 
and finds five points along the orbit to be 

(8.025,8.310), (10.170,6.355), (11.202,3.212), (10.736,0.375), (9.092,-2.267) 
Find the equation of the orbit. 

Solution Substituting the coordinates of the five given points into ( 1 0) and rounding to 
three decimal places give 


x 2 

xy 

y 2 

X 

y 

1 

64.401 

66.688 

69.056 

8.025 

8.310 

1 

103.429 

64.630 

40.386 

10.170 

6.355 

1 

125.485 

35.981 

10.317 

11.202 

3.212 

1 

115.262 

4.026 

0.141 

10.736 

0.375 

1 

82.664 

-20.612 

5.139 

9.092 

-2.267 

1 
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A Plane Through Three 
Points 


A SphereThrough Four 
Points 


The cofactor expansion of this determinant along the first row yields 

386.802jc 2 - 102.895xy + 446.029y 2 - 2476.443x - 1427.998y - 17109.375 = 0 
Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points. 


► Figure 10.1.5 
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In Exercise 8 we ask you to show the following: The plane in 3-space with equation 

c\x + c 2 y + c 3 z + c 4 — 0 


that passes through three noncollinear points (x\, y\, zi), (x 2 , y 2 , z 2 ), and (x 3 , y 3 , z 3 ) is 
given by the determinant equation 


X 

y 

z 

1 

Xl 

yi 

Zl 

1 

4:2 

yi 

z 2 

1 

x 3 


Z3 

1 


EXAMPLE 4 Equation of a Plane 

The equation of the plane that passes through the three noncollinear points (1, 1,0), 
(2, 0,-1), and (2,9,2) is 

x y z 1 

110 1 

= 0 

2 0-1 1 

2 9 2 1 

which reduces to 

2x — y + 3z — 1=0 ^ 

In Exercise 9 we ask you to show the following: The sphere in 3-space with equation 

ci (x 2 + y 2 + z 2 ) + c 2 x + c 3 y + c 4 z + c 5 = 0 

that passes through four noncoplanar points (xi, yi, zi), (x 2 , y 2 , z 2 ), (x 3 , y 3 , z 3 ), and 
(x 4 , y 4 , z 4 ) is given by the following determinant equation: 


x 2 

+ 
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+ 

z 2 
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y 
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Zl 

Xl 

yi 

Zl 
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X 2 
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y\ 
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Z2 
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y\ 
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X3 
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Z 3 

1 

x 4 
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V4 
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( 12 ) 
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► EXAMPLES Equation of a Sphere 

The equation of the sphere that passes through the four points (0, 3, 2), (1,-1, 1), 
(2, 1, 0), and (5, 1, 3) is 


x 2 + y 1 + z 2 x y 

13 0 3 

3 1 -1 

5 2 1 

35 5 1 


z 

2 

1 

0 

3 


1 

1 

1 

1 

1 


= 0 


This reduces to 

x 2 + v 2 + z 2 — 4x — 2y — 6z + 5 = 0 
which in standard form is 


(x - 2) 2 + (y - l) 2 + ( Z - 3) 2 = 9 ◄ 


Exercise Set 10.1 

1. Find the equations of the lines that pass through the following 
points: 

(a) (1,-1), (2,2) (b) (0, 1), (1,-1) 

2. Find the equations of the circles that pass through the follow- 
ing points: 

(a) (2, 6), (2, 0), (5, 3) (b) (2, -2), (3, 5), (-4, 6) 

3. Find the equation of the conic section that passes through the 
points (0, 0), (0, -1), (2, 0), (2, -5), and (4, -1). 

4. Find the equations of the planes in 3-space that pass through 
the following points: 

(a) (1, 1,-3), (1,-1, 1), (0,-1, 2) 

(b) (2,3,1), (2, -1,-1), (1,2,1) 

5. (a) Alter Equation (11) so that it determines the plane that 

passes through the origin and is parallel to the plane that 
passes through three specified noncollinear points. 

(b) Find the two planes described in part (a) corresponding 
to the triplets of points in Exercises 4(a) and 4(b). 

6. Find the equations of the spheres in 3-space that pass through 
the following points: 

(a) (1,2,3), (-1,2, 1), (1,0, 1), (1,2, -1) 

(b) (0, 1, -2), (1, 3, 1), (2, -1, 0), (3, 1, -1) 

7. Show that Equation (10) is the equation of the conic section 
that passes through five given distinct points in the plane. 

8. Show that Equation ( 1 1 ) is the equation of the plane in 3-space 
that passes through three given noncollinear points. 

9. Show that Equation (12) is the equation of the sphere in 3- 
space that passes through four given noncoplanar points. 


10. Find a determinant equation for the parabola of the form 

Ciy + c 2 x 2 + c 3 x + c 4 = 0 

that passes through three given noncollinear points in the 
plane. 

11. What does Equation (9) become if the three distinct points are 
collinear? 

12. What does Equation (11) become if the three distinct points 
are collinear? 

13. What does Equation (12) become if the four points are co- 
planar? 

Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. The general equation of a quadric surface is given by 

a ix 2 + a 2 y 2 + a 3 z 2 + a 4 x y + a 5 xz 

+ a^yz + ci 2 x 4- a%y 4- a^z + flio = 0 

Given nine points on this surface, it may be possible to determine 
its equation. 
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(a) Show that if the nine points (x;, for i — 1, 2, 3, . . . , 9 lie 
on this surface, and if they determine uniquely the equation of 
this surface, then its equation can be written in determinant 
form as 
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A point 

C*10, x 20, x 30> • • • > x no) £ R" 

lies on this hyperplane if 

+ 0 -i x 2 o + <* 3 X 30 + • • • + a n x n 0 + a n+ i = 0 

Given that then points (xu, X 2 ;, X 3 x n j ), i = 1, 2, 3, ... , 
n, lie on this hyperplane and that they uniquely determine the 
equation of the hyperplane, show that the equation of the hy- 
perplane can be written in determinant form as 


Xi 

X 2 

x 3 


1 

Xu 

X 21 

x 3l 

• • • X nl 

1 

X 12 

x 22 

X32 

X n 2 

1 

X13 

x 23 

X33 

Xn3 

1 

Xln 

x 2n 

x 3n 

Xfin 

1 


(b) Use the result in part (a) to determine the equation 
of the quadric surface that passes through the points 
(1, 2, 3), (2, 1, 7), (0, 4, 6), (3, -1,4), (3, 0, 11), (-1. 5, 8), 
(9, -8, 3), (4, 5, 3), and (-2, 6, 10). 

T2. (a) A hyperplane in the n -dimensional Euclidean space R" 
has an equation of the form 

a\X\ + d 2 x 2 + U3X3 + • • • + a n x n + a n + i = 0 


(b) Determine the equation of the hyperplane in R 9 that goes 
through the following nine points: 


(1,2, 3, 4, 5,6, 7,8,9) 
(3,4, 5, 6, 7, 8, 9, 1,2) 
(5,6, 7,8,9, 1,2, 3, 4) 
(7,8,9, 1,2, 3,4, 5, 6) 
(9, 1,2, 3,4, 5, 6, 7, 8) 


(2, 3,4, 5,6,7, 8, 9, 1) 
(4, 5, 6, 7, 8, 9, 1,2, 3) 
(6, 7,8,9, 1,2, 3.4, 5) 
(8,9, 1,2, 3,4, 5, 6, 7) 


where a,-, / = 1, 2, 3, . . . , n + 1, are constants, not all zero, 
and Xi, i = 1, 2, 3 are variables for which 


(xi,x 2 ,x 3 , ...,x„)e R" 


10.2 The Earliest Applications of Linear Algebra 

Linear systems can be found in the earliest writings of many ancient civilizations. In this 
section we give some examples of the types of problems that they used to solve. 


Linear Systems 


The practical problems of early civilizations included the measurement of land, the 
distribution of goods, the tracking of resources such as wheat and cattle, and taxation and 
inheritance calculations. In many cases, these problems led to linear systems of equations 
since linearity is one of the simplest relationships that can exist among variables. In this 
section we present examples from five diverse ancient cultures illustrating how they used 
and solved systems of linear equations. We restrict ourselves to examples before A.D. 
500. These examples consequently predate the development of the field of algebra by 
Islamic/Arab mathematicians, a field that ultimately led in the nineteenth century to the 
branch of mathematics now called linear algebra. 
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► EXAMPLE 1 Egypt (about 1650 B.C.) 
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Problem 40 of the Ahmes Papyrus 

[Image: © The Trustees of the British Museum] 


The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient 
Egyptian mathematics. This 5-meter-long papyrus contains 84 short mathematical prob- 
lems, together with their solutions, and dates from about 1650 B.C. Problem 40 in this 
papyrus is the following: 

Divide 100 hekats of barley among five men in arithmetic progression so that the sum of 
the two smallest is one-seventh the sum of the three largest. 

Let a be the least amount that any man obtains, and let d be the common difference of 
the terms in the arithmetic progression. Then the other four men receive a + d, a + 2d, 
a + 3d, and a + Ad hekats. The two conditions of the problem require that 

a -f- (a T d') 4- (a -f" 2d) 4- (a 4- 3 d) T {a -f- Ad) = 100 

^[(a T 2d) 4“ (a 4~ 3d) -p (a -P 4(7 ) ] = a 4- (n T d) 

These equations reduce to the following system of two equations in two unknowns: 

5 a + 10 d = 100 

( 1 ) 

11a — 2d = 0 

The solution technique described in the papyrus is known as the method of false posi- 
tion or false assumption. It begins by assuming some convenient value of a (in our case 
a = 1), substituting that value into the second equation, and obtaining d — 1 1/2. Sub- 
stituting a = landaf = 1 1 /2 into the left-hand side of the first equation gives 60, whereas 
the right-hand side is 100. Adjusting the initial guess for a by multiplying it by 100/60 
leads to the correct value a — 5/3. Substituting a — 5/3 into the second equation then 
gives d = 55/6, so the quantities of barley received by the five men are 10/6, 65/6, 120/6, 
175/6, and 230/6 hekats. This technique of guessing a value of an unknown and later 
adjusting it has been used by many cultures throughout the ages. 



► EXAMPLE 2 Babylonia (1900-1600 B.C.) 

The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.C, 
Many clay tablets containing mathematical tables and problems survive from that period, 
one of which (designated Ca MLA 1950) contains the next problem. The statement of 
the problem is a bit muddled because of the condition of the tablet, but the diagram and 
the solution on the tablet indicate that the problem is as follows: 


Babylonian clay tablet Ca MLA 
1950 [Image: American Oriental 
Society! American Schools of Ori- 
ental Research ] 
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Chiu Chang Suan Shu in Chinese 
characters 



A trapezoid with an area of 320 square units is cut off from a right triangle by a line 
parallel to one of its sides. The other side has length 50 units, and the height of the 
trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? 

Let x be the lower width of the trapezoid and y its upper width. The area of the trapezoid 
is its height times its average width, so 20 (^) = 320. Using similar triangles, we also 
have £ = £■ The solution on the tablet uses these relations to generate the linear system 

H* + y) = 16 

; ( 2 ) 

\(x-y)= 4 

Adding and subtracting these two equations then gives the solution x = 20 and y = 12. 


► EXAMPLE 3 China (a.d. 263) 

The most important treatise in the history of Chinese mathematics is the Chiu Chang 
Suan Shu, or “The Nine Chapters of the Mathematical Art.” This treatise, which is a 
collection of 246 problems and their solutions, was assembled in its final form by Liu 
Hui in A.D. 263. Its contents, however, go back to at least the beginning of the Han 
dynasty in the second century B.C. The eighth of its nine chapters, entitled “The Way of 
Calculating by Arrays,” contains 18 word problems that lead to linear systems in three 
to six unknowns. The general solution procedure described is almost identical to the 
Gaussian elimination technique developed in Europe in the nineteenth century by Carl 
Friedrich Gauss (see page 15). The first problem in the eighth chapter is the following: 

There are three classes of corn, of which three bundles of the first class, two of the second, 
and one of the third make 39 measures. Two of the first, three of the second, and one 
of the third make 34 measures. And one of the first, two of the second, and three of the 
third make 26 measures. How many measures of grain are contained in one bundle of 
each class? 

Let x, y, and z be the measures of the first, second, and third classes of corn. Then the 
conditions of the problem lead to the following linear system of three equations in three 
unknowns: 


3x + 2 y + z = 39 
2x + 3y+ Z = 34 
x + 2y + 3z = 26 


( 3 ) 
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The solution described in the treatise represented the coefficients of each equation by 
an appropriate number of rods placed within squares on a counting table. Positive 
coefficients were represented by black rods, negative coefficients were represented by red 
rods, and the squares corresponding to zero coefficients were left empty. The counting 
table was laid out as follows so that the coefficients of each equation appear in columns 
with the first equation in the rightmost column: 


1 

2 

3 

2 

3 

2 

3 

1 

1 

26 

34 

39 


Next, the numbers of rods within the squares were adjusted to accomplish the following 
two steps: (1) two times the numbers of the third column were subtracted from three 
times the numbers in the second column and (2) the numbers in the third column were 
subtracted from three times the numbers in the first column. The result was the following 
array: 




3 

4 

5 

2 

8 

1 

1 

39 

24 

39 


In this array, four times the numbers in the second column were subtracted from five 
times the numbers in the first column, yielding 




3 


5 

2 

36 

1 

1 

99 

24 

39 



This last array is equivalent to the linear system 

3x + 2y + z = 39 
5y + z = 24 
36 z = 99 

This triangular system was solved by a method equivalent to back substitution to obtain 
x = 37/4, y — 17/4, and z= 11/4. 


► EXAMPLE 4 Greece (third century B.c.) 

Perhaps the most famous system of linear equations from antiquity is the one associated 
with the first part of Archimedes’ celebrated Cattle Problem. This problem supposedly 
was posed by Archimedes as a challenge to his colleague Eratosthenes. No solution has 
come down to us from ancient times, so that it is not known how, or even whether, either 
of these two geometers solved it. 


Archimedes c. 287-212 B.C. 
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If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who 
once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four 
herds of different colors, one milk white, another glossy black, a third yellow, and the 
last dappled. In each herd were bulls, mighty in number according to these proportions: 
Understand, stranger, that the white bulls were equal to a half and a third of the black 
together with the whole of the yellow, while the black were equal to the fourth part of 
the dappled and a fifth, together with, once more, the whole of the yellow. Observe 
further that the remaining bulls, the dappled, were equal to a sixth part of the white and 
a seventh, together with all of the yellow. These were the proportions of the cows: The 
white were precisely equal to the third part and a fourth of the whole herd of the black; 
while the black were equal to the fourth part once more of the dappled and with it a fifth 
part, when all, including the bulls, went to pasture together. Now the dappled in four 
parts were equal in number to a fifth part and a sixth of the yellow herd. Finally the 
yellow were in number equal to a sixth part and a seventh of the white herd. If thou canst 
accurately tell, O stranger, the number of cattle of the Sun, giving separately the number 
of well-fed bulls and again the number of females according to each color, thou wouldst 
not be called unskilled or ignorant of numbers, but not yet shalt thou be numbered among 
the wise. 


The conventional designation of the eight variables in this problem is 


W 

B 

Y 

D 

w 

b 

y 

d 


number of white bulls 
number of black bulls 
number of yellow bulls 
number of dappled bulls 
number of white cows 
number of black cows 
number of yellow cows 
number of dappled cows 


The problem can now be stated as the following seven homogeneous equations in eight 
unknowns: 


1 . 

2 . 

3 . 

4 . 

5 . 


W = (\ + \)B + Y 

b = {\ + \)d + y 

D=(\ + ])W + Y 

w — (| + j) ( B + b ) 

b = (j + 5) (D + d) 


(The white bulls were equal to a half and a third of 
the black [bulls] together with the whole of the yellow 
[bulls].) 

(The black [bulls] were equal to the fourth part of the 
dappled [bulls] and a fifth, together with, once more, 
the whole of the yellow [bulls].) 

(The remaining bulls, the dappled, were equal to a sixth 
part of the white [bulls] and a seventh, together with all 
of the yellow [bulls].) 

(The white [cows] were precisely equal to the third part 
and a fourth of the whole herd of the black.) 

(The black [cows] were equal to the fourth part once 
more of the dappled and with it a fifth part, when all, 
including the bulls, went to pasture together.) 
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6. d — (i + i) (Y + y ) (The dappled [cows] in four parts [that is, in totality] 

were equal in number to a fifth part and a sixth of the 
yellow herd.) 

y — (| + (W + w) (The yellow [cows] were in number equal to a sixth part 
and a seventh of the white herd.) 

As we ask you to show in the exercises, this system has infinitely many solutions of the 
form 


w = 

10,366,482^ 

B = 

7,460, 514* 

Y = 

4, 149,387* 

D = 

7,358,060 k 

w = 

7,206,360k 

b = 

4,893,246£ 

y = 

5,439,213 k 

d = 

3,515,820 k 


where k is any real number. The values k = 1,2,... give infinitely many positive integer 
solutions to the problem, with k = 1 giving the smallest solution. 



Fragment III-5-3v of the 
Bakhshali Manuscript 
[Image: Bodleian Library, University 
of Oxford, MS. Sansk. d. 14, 
fragment III 5 3 v. ] 


► EXAMPLES India (fourth century A. D.) 

The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from 
around the fourth century A.D., although some of its materials undoubtedly come from 
many centuries before. It consists of about 70 leaves or sheets of birch bark containing 
mathematical problems and their solutions. Many of its problems are so-called equal- 
ization problems that lead to systems of linear equations. One such problem on the frag- 
ment shown is the following: 

One merchant has seven asava horses, a second has nine haya horses, and a third has ten 
camels. They are equally well off in the value of their animals if each gives two animals, 
one to each of the others. Find the price of each animal and the total value of the animals 
possessed by each merchant. 


Let x be the price of an asava horse, let y be the price of a haya horse, let z be the price 
of a camel, and the let K be the total value of the animals possessed by each merchant. 
Then the conditions of the problem lead to the following system of equations: 

5x + y + z = K 

x + 1 y+z = K (5) 

x + y + 8z = K 

The method of solution described in the manuscript begins by subtracting the quan- 
tity (x T y + z) from both sides of the three equations to obtain Ax = 6y = lz = K — 
(x + y + z). This shows that if the prices x, y, and z are to be integers, then the quantity 
K — (x + y + z) must be an integer that is divisible by 4, 6, and 7. The manuscript 
takes the product of these three numbers, or 168, for the value of K — (x + y + z), 
which yields x = 42, y = 28, and z = 24 for the prices and K = 262 for the total value. 
(See Exercise 6 for more solutions to this problem.) 
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Exercise Set 10.2 

1. The following lines from Book 12 of Homer’s Odyssey relate a 
precursor of Archimedes’ Cattle Problem: 

Thou shalt ascend the isle triangular. 

Where many oxen of the Sun are fed. 

And fatted flocks. Of oxen fifty head 

In every herd feed, and their herds are seven; 

And of his fat flocks is their number even. 

The last line means that there are as many sheep in all the flocks 
as there are oxen in all the herds. What is the total number of 
oxen and sheep that belong to the god of the Sun? (This was a 
difficult problem in Homer’s day.) 

2. Solve the following problems from the Bakhshali Manuscript. 

(a) B possesses two times as much as A; C has three times as 
much as A and B together; D has four times as much as A, 
B, and C together. Their total possessions are 300. What is 
the possession of A? 

(b) B gives 2 times as much as A; C gives 3 times as much as B; 
D gives 4 times as much as C. Their total gift is 132. What 
is the gift of A? 

3. A problem on a Babylonian tablet requires finding the length 
and width of a rectangle given that the length and the width 
add up to 10, while the length and one-fourth of the width add 
up to 7. The solution provided on the tablet consists of the 
following four statements: 

Multiply 7 by 4 to obtain 28. 

Take away 10 from 28 to obtain 18. 

Take one-third of 18 to obtain 6, the length. 

Take away 6 from 10 to obtain 4, the width. 

Explain how these steps lead to the answer. 

4. The following two problems are from “The Nine Chapters of 
the Mathematical Art.” Solve them using the array technique 
described in Example 3. 

(a) Five oxen and two sheep are worth 10 units and two oxen 
and five sheep are worth 8 units. What is the value of each 
ox and sheep? 

(b) There are three kinds of corn. The grains contained in two, 
three, and four bundles, respectively, of these three classes 
of corn, are not sufficient to make a whole measure. How- 
ever, if we added to them one bundle of the second, third, 
and first classes, respectively, then the grains would become 
on full measure in each case. How many measures of grain 
does each bundle of the different classes contain? 

5. This problem in part (a) is known as the “Flower of Thymari- 
das,” named after a Pythagorean of the fourth century B.C. 

(a) Given the n numbers «[, a 2 , . . . , a„, solve for 


X\ , X 2 , ■ . . , x n in the following linear system: 

X\ T %2 T ' ' ' T x n = 0\ 

Xl + X2 = 02 

X\ X 2 = Oj 

Xi -f Xn — o n 

(b) Identify a problem in this exercise set that fits the pattern 
in part (a), and solve it using your general solution. 

6. For Example 5 from the Bakhshali Manuscript: 

(a) Express Equations (5) as a homogeneous linear system of 
three equations in four unknowns (x, y, z , and K) and show 
that the solution set has one arbitrary parameter. 

(b) Find the smallest solution for which all four variables are 
positive integers. 

(c) Show that the solution given in Example 5 is included 
among your solutions. 

7. Solve the problems posed in the following three epigrams, which 
appear in a collection entitled “The Greek Anthology,” com- 
piled in part by a scholar named Metrodorus around A.D. 500. 
Some of its 46 mathematical problems are believed to date as 
far back as 600 B.C. [Note: Before solving parts (a) and (c), you 
will have to formulate the question.] 

(a) I desire my two sons to receive the thousand staters of which 
I am possessed, but let the fifth part of the legitimate one’s 
share exceed by ten the fourth part of what falls to the ille- 
gitimate one. 

(b) Make me a crown weighing sixty minae, mixing gold and 
brass, and with them tin and much- wrought iron. Let the 
gold and brass together form two-thirds, the gold and tin 
together three-fourths, and the gold and iron three-fifths. 
Tell me how much gold you must put in, how much brass, 
how much tin, and how much iron, so as to make the whole 
crown weigh sixty minae. 

(c) First person: I have what the second has and the third of 
what the third has. Second person: I have what the third 
has and the third of what the first has. Third person: And 
I have ten minae and the third of what the second has. 

Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 
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Tl. (a) Solve Archimedes’ Cattle Problem using a symbolic alge- 
bra program. 

(b) The Cattle Problem has a second part in which two additional 
conditions are imposed. The first of these states that “When 
the white bulls mingled their number with the black, they stood 
firm, equal in depth and breadth.” This requires that W + B 
be a square number, that is, 1, 4, 9, 16, 25, and so on. Show 
that this requires that the values of k in Eq. (4) be restricted as 
follows: 

k — 4,456, 749r 2 , r=l,2, 3, ... 
and find the smallest total number of cattle that satisfies this 
second condition. 

Remark The second condition imposed in the second part of 
the Cattle Problem states that “When the yellow and the dappled 
bulls were gathered into one herd, they stood in such a manner 
that their number, beginning from one, grew slowly greater ’til it 
completed a triangular figure.” This requires that the quantity 
Y + D be a triangular number — that is, a number of the form 1, 


1 + 2, 1+2 + 3, 1 + 2 + 3 + 4, .... This final part of the prob- 
lem was not completely solved until 1965 when all 206,545 digits 
of the smallest number of cattle that satisfies this condition were 
found using a computer. 

T2. The following problem is from “The Nine Chapters of the 
Mathematical Art” and determines a homogeneous linear system 
of five equations in six unknowns. Show that the system has in- 
finitely many solutions, and find the one for which the depth of 
the well and the lengths of the five ropes are the smallest possible 
positive integers. 

Suppose that five families share a well. Suppose further that 

2 of A’s ropes are short of the well’s depth by one of B’s ropes. 

3 of B’s ropes are short of the well’s depth by one of C’s ropes. 

4 of C’s ropes are short of the well’s depth by one of D’s ropes. 

5 of D’s ropes are short of the well’s depth by one of E’s ropes. 

6 of E’s ropes are short of the well’s depth by one of A’s ropes. 


10.3 Cubic Spline Interpolation 

In this section an artist’s drafting aid is used as a physical model for the mathematical 
problem of finding a curve that passes through specified points in the plane. The 
parameters of the curve are determined by solving a linear system of equations. 


Linear Systems 
Matrix Algebra 
Differential Calculus 


Curve Fitting Fitting a curve through specified points in the plane is a common problem encountered in 
analyzing experimental data, in ascertaining the relations among variables, and in design 
work. A ubiquitous application is in the design and description of computer and printer 
fonts, such as PostScript™ and TrueType™ fonts (Figure 10.3.1). In Figure 10.3.2 
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y 

• • 


▲ Figure 10.3.2 


Statement of the Problem 


seven points in the xy-plane are displayed, and in Figure 10.3.4 a smooth curve has been 
drawn that passes through them. A curve that passes through a set of points in the plane 
is said to interpolate those points, and the curve is called an interpolating curve for those 
points. The interpolating curve in Figure 10.3.4 was drawn with the aid of a drafting 
spline (Figure 10.3.3). This drafting aid consists of a thin, flexible strip of wood or other 
material that is bent to pass through the points to be interpolated. Attached sliding 
weights hold the spline in position while the artist draws the interpolating curve. The 
drafting spline will serve as the physical model for a mathematical theory of interpolation 
that we will discuss in this section. 



Suppose that we are given n points in the xy-plane, 

Oi, yi), (x 2 , y 2 ), ■ . . , (x„, y„) 

which we wish to interpolate with a “well-behaved” curve (Figure 10.3.5). For conve- 
nience, we take the points to be equally spaced in the x -direction, although our results 
can easily be extended to the case of unequally spaced points. If we let the common 
distance between the x -coordinates of the points be h, then we have 

x 2 — xi = x 3 — x 2 = • • • = x„ — X n -\ = h 

Let y — S(x), x\ < x < x n denote the interpolating curve that we seek. We assume that 
this curve describes the displacement of a drafting spline that interpolates the n points 
when the weights holding down the spline are situated precisely at the n points. It is 
known from linear beam theory that for small displacements, the fourth derivative of the 
displacement of a beam is zero along any interval of the x-axis that contains no external 
forces acting on the beam. If we treat our drafting spline as a thin beam and realize that 
the only external forces acting on it arise from the weights at the n specified points, then 
it follows that 

S (iv) (x) = 0 (1) 

for values of x lying in the n — 1 open intervals 

(xi,x 2 ), (x 2 ,x 3 ), . . . , (x„_i,x„) 


between the n points. 
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We also need the result from linear beam theory that states that for a beam acted 
upon only by external forces, the displacement must have two continuous derivatives. 
In the case of the interpolating curve y — S(x) constructed by the drafting spline, this 
means that S(x), S'(x), and S"(x) must be continuous for x\ < x < x n . 

The condition that S"(x) be continuous is what causes a drafting spline to produce 
a pleasing curve, as it results in continuous curvature. The eye can perceive sudden 
changes in curvature — that is, discontinuities in S"(x ) — but sudden changes in higher 
derivatives are not discernible. Thus, the condition that S"(x) be continuous is the 
minimal prerequisite for the interpolating curve to be perceptible as a single smooth 
curve, rather than as a series of separate curves pieced together. 

To determine the mathematical form of the function S(x), we observe that because 
5 (iv) (x) = 0 in the intervals between the n specified points, it follows by integrating this 
equation four times that S(x) must be a cubic polynomial in x in each such interval. In 
general, however, S(x) will be a different cubic polynomial in each interval, so S(x) must 
have the form 


■S'i(x), 

Xl < X < X2 

S 2 (x), 

X2 < X < X3 

Sn- l(x), 

X„-l < X < X, 


where Si(x), ^(x), . . . , S„_i(x) are cubic polynomials. For convenience, we will write 
these in the form 

S'i(x) = a\(x — xi) 3 + b\ (x — xi) 2 + Ci(x — Xi) + d\, xi < x < x 2 

S 2 (x) = a 2 (x — X 2) 3 + b 2 (x — X 2) 2 + C 2 (x — X 2 ) + d 2 , x 2 < x < x 2 

; (3) 

S„-\(x) — a n - i(x - x„_i) 3 + b n - i(x - x n — 1 ) 2 

+ c„_i (x - x„_i) + d n —\ , x„_i < x < x„ 

The at s, bi s, c,’s, and d ? s constitute a total of 4n — 4 coefficients that we must determine 
to specify S(x) completely. If we choose these coefficients so that S(x ) interpolates the n 
specified points in the plane and S (x), S'(x), and S"(x) are continuous, then the resulting 
interpolating curve is called a cubic spline. 


Derivation of the Formula 
of a Cubic Spline 


From Equations (2) and (3), we have 

S(x ) = Si(x) = a i(x — xi) 3 + b\(x — xi) 2 + Ci(x — Xi) + d\, 
S(x) = S 2 (x) = a 2 (x - x 2 ) 3 + b 2 (x - x 2 ) 2 + c 2 (x - x 2 ) + d 2 , 


Xl < X < x 2 
X2 < X < X3 


S(x) = S n - i(x) = a„-i (x - x„ _ 1 ) 3 + b n - i(x - x„_i) 2 

+ c„_i(x-x„_i)+J„_i, 


so 

S'(x ) = S[(x) — 3o 1 (x — xi) 2 + 2b\ (x — xi) + c\, 
S'(x) = S' 2 (x) — 3a 2 (x — X2) 3 + 2b 2 (x — x 2 ) + c 2 , 


x„_i < x < x, 


(4) 


Xi < X < X2 
X2 < X < X3 


S'(x) = S'_j(x) = 3o„_i(x - x„_i) 2 + 2 b n _i(x - x„_i) + c„_i, x„_, < x < x n 

(5) 
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S"(x) = S'{ (x) = 6 cn(x - xi) + 2b u 
S"(x) — S'^ix) — 6 ci 2 (x — X 2 ) + 2 Z? 2 - 


Xl < X < Xi 
X2 < X < X3 


S"(x) = S"_,(x) = 6a„_i(x - + 2 b n _ u x„_, < x < x n 


( 6 ) 


We will now use these equations and the four properties of cubic splines stated below 
to express the unknown coefficients a,-, bj, a, dj, i = 1 , 2 , ... ,n — 1 , in terms of the 
known coordinates y\,yi, ■ ■ ■ ,y n . 


S{x) interpolates the points (x, , y,- ) , i = 1, 2, . . . , n. 

Because S(x) interpolates the points (x,-, y,), i = 1, 2, we have 

S(xi) = yu S(x 2 ) = y 2 , ..., S(x n ) = y n (7) 

From the first n — 1 of these equations and (4), we obtain 

d\ = yi 

d2 = V2 

. ( 8 ) 
d n — 1 = yn — 1 

From the last equation in (7), the last equation in (4), and the fact that x n — x„_i = h, 
we obtain 

+ b n -\ h~ + c?i-\h + d ,, — ] = y„ (9) 


5(x) is continuous on [xi, x„]. 

Because 5(x) is continuous for x\ < x < x„, it follows that at each point x, in the 
set X2, X3, . . . , x„_i we must have 


5i_i(x,-) = Si(xi), i = 2, 3, . . . , n - 1 (10) 

Otherwise, the graphs of Sj_i(x) and S/(x) would not join together to form a con- 
tinuous curve at x,- . When we apply the interpolating property 5) (x,) = y t , it follows 
from (10) that S,_i (x,) = y,,i = 2, 3, . . . , n — 1, or from (4) that 


a\h 3 + b\h 2 + c\h + d\ = y 2 
a2h i + b^h 2 + C2h + d2 = yi 


a n - 2 h 3 + 6 „- 2^ 2 + c n - 2 h + d n - 2 = y„_i 

3. 5'(x) is continuous on [xj , x„]. 

Because S'(x) is continuous for xi < x < x„, it follows that 


(ID 


s;_,(x,) = sic*), 


i = 2, 3, ..., n — 1 
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or, from (5), 


3«[/j 2 + 2b\h + ci = C 2 
3ci2h 2 + 2b 2 h + C 2 = C3 


3a«-2^ 2 + 2b n _ 2 h + c„_ 2 — c„_i 
S"(x) is continuous on [xi, x 2 ]. 

Because S"(x) is continuous for xi < x < x n , it follows that 


(12) 


S?_ l (x i ) = S?(x i ), i = 2,3, ... ,n — 1 


or, from (6), 

6a\h + 2 b\ = 2b 2 
6«2^ + 2 bi = 2bj 

. (13) 

6 fl„_ 2 /t + 2b n -2 = 2b n -\ 

Equations (8), (9), (11), (12), and (13) constitute a system of 4n — 6 linear equations 
in the 4« — 4 unknown coefficients a,-, b q, r/,-, i = 1,2 — 1. Consequently, 
we need two more equations to determine these coefficients uniquely. Before obtaining 
these additional equations, however, we can simplify our existing system by expressing 
the unknowns a, , b , , c; , and di in terms of new unknown quantities 


M\ = S"{x 1), M 2 = S'\x 2 ), . . . , M n = S"(x n ) 


and the known quantities 


yi, )h, ■■■,>« 


For example, from (6) it follows that 


M, = 2b 1 

M 2 — 2b 2 


M n —i — 2b n —\ 

so 

bi = 5M1, b 2 = \M2 , . . . , b„-i = jM„_ 1 
Moreover, we already know from (8) that 

di = yi, d 2 = y 2 , • ■ • , 4-1 = y«-i 

We leave it as an exercise for you to derive the expressions for the fl,’s and c,’s in terms 
of the M,’s and y,’s. The final result is as follows: 


10.3 Cubic Spline Interpolation 545 


Cubic Spline Interpolation 

Given n points (xi, yi), ( x 2 , yi), ■ ■ ■ , ( x„ , y„ ) with x i+ 1 — x, = /z , i = 1, 2 

n — 1 , ?/te cubic spline 


S(x) = 


a i(x — X1) 3 + b i(x — X1) 2 + c i(x — xi) + d \ , 
a 2 (x — X2) 3 + b 2 (x — X2) 2 + c 2 (x — X2) + d 2, 

a n - i(x - x„_i) 3 + b n —\ (x - x„_i) 2 

+ c„_i(x-x„_i)+J„_i, 


interpolates these points has coefficients given by 


xi < x < x 2 

X 2 < X < X 3 


X n - 1 5 X < X, 


a; = (M/ + i — Mi)/6h 
bi = Mfil 

ci = (>’ !+ i - yd/ h - [(Af i+ i + 2Mi)h/6] 
di = yi 

for i = 1, 2 , . . . , n — 1 , where M-, = S'fixi), i = 1 , 2 , , n. 


( 14 ) 


From this result, we see that the quantities M \ , M2, . . . , M n uniquely determine the 
cubic spline. To find these quantities, we substitute the expressions for a,-, bi, and a 
given in (14) into (12). After some algebraic simplification, we obtain 

Mi + 4M2 + M3 = 6(yi — 2 yi + yi)/ h 2 
M2 + 4M3 + M4 = 6(y2 — 2y 3 + V4 )/ h 2 

: ' (15) 

M n ~ 2 + 4M„_! + M„ = 6(y„_2 — 2y„_i + y„)/ h~ 
or, in matrix form. 


Mj 


"1 

4 

1 

0 • 

■ 0 

0 

0 

O' 


m 2 


yi — 2y 2 + y 3 

0 

1 

4 

1 • 

■ 0 

0 

0 

0 


m 3 


yi - 2y 3 + y 4 

0 

0 

1 

4 • 

■ 0 

0 

0 

0 


m 4 

6 

y 3 - 2y 4 + y 5 

0 

0 

0 

0 • 

• 4 

1 

0 

0 


M n —3 

= If- 

Jn-4 — 2y„_ 3 + y„_2 

0 

0 

0 

0 • 

• 1 

4 

1 

0 


M n - 2 


y n - 3 — 2y„_2 + y „- 1 

0 

0 

0 

0 • 

• 0 

1 

4 

1_ 


M n —\ 


y n — 2 2y H _ 1 + y n _ 


M„ 


This is a linear system of n — 2 equations for the n unknowns Mi , M2 , . . . , M„ . Thus, we 
still need two additional equations to determine Mi, M2, . . . , M„ uniquely. The reason 
for this is that there are infinitely many cubic splines that interpolate the given points, 
so we simply do not have enough conditions to determine a unique cubic spline passing 
through the points. We discuss below three possible ways of specifying the two additional 
conditions required to obtain a unique cubic spline through the points. (The exercises 
present two more.) They are summarized in Table 1. 
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Table 1 


Natural The second 

Spline derivative of the 

spline is zero at the 
endpoints. 


M, = 0 

M n - 0 


Parabolic 

Runout 

Spline 


The spline reduces 
to a parabolic curve 
on the first and last 
intervals. 


M, 

M„ 


= M 2 

= M n _ i 


4 1 
1 4 

0 0 
0 0 


5 1 
1 4 

0 0 
0 0 


Cubic 

Runout 

Spline 


The spline is a M\— 2 M 2 - M 3 

single cubic curve M„ - 2 M n _ x - M n _ 2 
on the first two and 
last two intervals. 


6 0 
1 4 

0 0 
0 0 


0-000 

1-000 

0-141 

0-014 


0-000 

1-000 

0-141 

0-015 


0-000 

1-000 

0-141 

0-006 


m 2 

m 3 

M „-2 

^ n - 1 


6_ 

h 2 


JT - 2v 2 + Tt 

jt> - 2j' 3 + y A 

y n ~ 2 - 2 y,,-i + y n 


m 2 

m 3 

Mn-2 

M n . x 


6_ 

h 1 


JT - 2y 2 + y 3 
JT> ~ 2 y 3 + .14 

y n - 2 - 2.V„_1 + y n 


m 2 

m 3 

M n -2 

M n ~ i 


6_ 

h 1 


JT - 2y 2 + v 3 
v 2 - 2y 3 + y 4 


yn -2 - 2y n _i + y n 


The Natural Spline The two simplest mathematical conditions we can impose are 

=M n = 0 

These conditions together with (15) result in an n x n linear system for M \ , M 2 , . . . , M„ , 
which can be written in matrix form as 


"1 0 0 0- 

14 10- 

0 14 1- 

1 

o o o 

o o o 

o o o 


s £ s 

1 

6 

0 

y\ — 2 y 2 + y 3 
y 2 - 2 y 3 + y 4 

0 0 0 0- 

0 0 0 0- 

• 1 4 1 

• 0 0 1 


_ . 

“ h 2 

yn — 2 2y n — i + y n 

0 


For numerical calculations it is more convenient to eliminate and M n from this system 
and write 

"4 1 0 0 • • • 0 0 

1 4 1 0 ••• 0 0 

0 1 4 1 ••• 0 0 

0 0 0 0 1 4 

_0 0 0 0 0 1 

together with 


Thus, the (n — 2) x (n — 2) linear system can be solved for the n — 2 coefficients M 2 , 
M 3 , . . . , M„_ u and M x and M n are determined by (17) and (18). 

Physically, the natural spline results when the ends of a drafting spline extend freely 
beyond the interpolating points without constraint. The end portions of the spline 



' m 2 " 


yi - ly 2 + V3 


m 3 


y 2 - 2 y 3 + y 4 


m 4 

6 

y 3 - 2y 4 + ys 


M n — 2 

h 2 

yn-3 — 2 y „-2 + y n - 1 


M n —\ 


_ y„- 2 - 2y„_i + y n 


(16) 


M i=0 
M„ = 0 


(17) 

(18) 


10.3 Cubic Spline Interpolation 547 


The Parabolic Runout 
Spline 


The Cubic Runout Spline 


outside the interpolating points will fall on straight line paths, causing S"(x) to vanish 
at the endpoints X\ and x n and resulting in the mathematical conditions M\ — M„ — 0. 

The natural spline tends to flatten the interpolating curve at the endpoints, which 
may be undesirable. Of course, if it is required that S"(x) vanish at the endpoints, then 
the natural spline must be used. 


The two additional constraints imposed for this type of spline are 

M\ = Mi 
M n = M„_ i 


(19) 

(20) 


If we use the preceding two equations to eliminate M \ and M„ from (15), we obtain the 
(n — 2) x ( n — 2) linear system 


"5100- 

■ 0 0 o' 


" m 2 " 


yi — 2y 2 + y3 

14 10- 

■ 0 0 0 


Mi 


y 2 _ 2y 3 + y 4 

0 14 1- 

• 0 0 0 


m 4 

6 

y 2 — 2 y 4 + y 5 

0 0 0 0- 

• 1 4 1 


M n - 2 

h 2 

yn-i — 2y„_i + y„_i 

0 0 0 0- 

• 0 1 5 


_M n — 1_ 


_ y n - 2 - 2y„_i + y n _ 


for A#2, M3 , . . . , M„_ 1. Once these n — 2 values have been determined, M\ and M„ are 
determined from (19) and (20). 

From (14) we see that M\ = Mi implies that ci\ = 0, and M n — M„_ 1 implies that 
a n _i — 0. Thus, from (3) there are no cubic terms in the formula for the spline over the 
end intervals [x\, x 2 \ and [x„_i, x„]. Hence, as the name suggests, the parabolic runout 
spline reduces to a parabolic curve over these end intervals. 


For this type of spline, we impose the two additional conditions 

Mi = 2 M 2 - Mi ( 22 ) 

Mn = 2M„_j - M„_ 2 (23) 


Using these two equations to eliminate M\ and M„ from (15) results in the following 
(n — 2) x (n — 2) linear system for M2, M3, ... , M„_ p 


"6 0 0 0- 

0 

0 

0 


" M 2 ' 


yi - 2y 2 + y3 

14 10- 

0 

0 

0 


Mi 


J2 — 2 ys + y 4 

0 14 1- 

• 0 0 0 


m 4 

6 

y 3 - 2 y 4 + y 5 

0 0 0 0- 

■ 1 4 1 


M n - 2 

~ h 2 

yn-i — 2y n -2 + y n - 1 

0 0 0 0- 

■ 0 0 6 


Mn—\ 


_ y „-2 - 2y n ~i + y n _ 


After we solve this linear system for M 2 , M 3 , . . . , M„_i, we can use (22) and (23) to 
determine M\ and M„ . 

If we rewrite (22) as 

Mi — M\ = Mi — Mi 

it follows from (14) that a\ = a 2 . Because S'"(x) — 6ci\ on [x\ , x 2 \ and S'" (x) — 6a 2 on 
[x 2 , X3], we see that S"'(x) is constant over the entire interval [jci , JC3]. Consequently, 
.S(x) consists of a single cubic curve over the interval [xi, x 3 ] rather than two different 
cubic curves pieced together at x 2 . [To see this, integrate S"'(x) three times.] A similar 
analysis shows that S(x) consists of a single cubic curve over the last two intervals. 
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Whereas the natural spline tends to produce an interpolating curve that is flat at the 
endpoints, the cubic runout spline has the opposite tendency: it produces a curve with 
pronounced curvature at the endpoints. If neither behavior is desired, the parabolic 
runout spline is a reasonable compromise. 

I EXAMPLE 1 Using a Parabolic Runout Spline 

The density of water is well known to reach a maximum at a temperature slightly above 
freezing. Table 2, from the Handbook of Chemistry and Physics (CRC Press, 2009), gives 
the density of water in grams per cubic centimeter for five equally spaced temperatures 
from — 10°C to 30° C. We will interpolate these five temperature-density measurements 
with a parabolic runout spline and attempt to find the maximum density of water in 
this range by finding the maximum value on this cubic spline. In the exercises we ask 
you to perform similar calculations using a natural spline and a cubic runout spline to 
interpolate the data points. 


Table 2 


Temperature (°C) 

Density (g/cm 3 ) 

-10 

.99815 

0 

.99987 

10 

.99973 

20 

.99823 

30 

.99567 


Xi = 

-10, 

yi = 

.99815 

x 2 = 

0, 

J2 = 

.99987 

X 3 = 

10, 

T3 = 

.99973 

X4 = 

20, 

y 4 = 

.99823 

*5 = 

30, 

ys = 

.99567 


Then 

6[yi — 2y 2 + yi ]/ h 2 = —.0001116 
6[V2 — 2y 3 + 3/4]/ h 2 — —.0000816 
6[y3 — 2y4 + V5]/ h 2 = —.0000636 
and the linear system (21) for the parabolic runout spline becomes 


”5 1 o' 


~M 2 ' 


"-.0001116" 

1 4 1 


Mi 

= 

-.0000816 

0 1 5 


m 4 


-.0000636 


Solving this system yields 

M 2 = -.00001973 
M 3 = -.00001293 
M 4 = -.00001013 

From (19) and (20), we have 

M\ = Mi = -.00001973 
M 5 = M 4 = -.00001013 
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Solving for the a,’s, bf s, c,’s, and dfs in (14), we obtain the following expression for the 
interpolating parabolic runout spline: 

-,00000987(x + 10) 2 + . 0002707 (x + 10) + .99815, -10<x< 0 
.0000001 13 (jc - 0) 3 — .00000987(x - 0) 2 + ,0000733(x - 0) + .99987, 0 < x < 10 

. 000000047 (x - 10) 3 — .00000647(x - 10) 2 - ,0000900(x - 10) + .99973, 10 < x < 20 

— . 00000507 (x - 20) 2 - ,0002053(x - 20) + .99823, 20 < x < 30 

This spline is plotted in Figure 10.3.6. From that figure we see that the maximum is 
attained in the interval [0, 10], To find this maximum, we set S'(x) equal to zero in the 
interval [0, 10]: 

S' (x) = ,000000339x 2 - ,0000197x + .0000733 = 0 

To three significant digits the root of this quadratic in the interval [0, 10] is x = 3.99, 
and for this value of x, 5(3.99) = 1.00001. Thus, according to our interpolated estimate, 
the maximum density of water is 1 .00001 g/cm 3 attained at 3.99°C. This agrees well with 
the experimental maximum density of 1.00000 g/cm 3 attained at 3.98°C. (In the original 
metric system, the gram was defined as the mass of one cubic centimeter of water at its 
maximum density.) 



0.99500 

-10 0 10 20 30 

► Figure 10.3.6 Temperature (°C) 


Closing Remarks In addition to producing excellent interpolating curves, cubic splines and their gen- 
eralizations are useful for numerical integration and differentiation, for the numerical 
solution of differential and integral equations, and in optimization theory. 


Exercise Set 10.3 

1. Derive the expressions for a,- and c,- in Equations (14) of Theo- 
rem 10.3.1. 

2. The six points 

(0, .00000), (.2, .19867), (.4, .38942), 

(.6, .56464), (.8, .71736), (1.0, .84147) 

lie on the graph of y = sinx, where x is in radians. 


(a) Find the portion of the parabolic runout spline that interpo- 
lates these six points for .4 < x < .6. Maintain an accuracy 
of five decimal places in your calculations. 

(b) Calculate S(.5) for the spline you found in part (a). What 
is the percentage error of S(.5) with respect to the “exact” 
value of sin(.5) = .47943? 
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3. The following five points 


Working with Technology 


(0,1), (1,7), (2,27), (3,79), (4,181) 

lie on a single cubic curve. 

(a) Which of the three types of cubic splines (natural, parabolic 
runout, or cubic runout) would agree exactly with the single 
cubic curve on which the five points lie? 

(b) Determine the cubic spline you chose in part (a), and verify 
that it is a single cubic curve that interpolates the five points. 

4. Repeat the calculations in Example 1 using a natural spline to 
interpolate the five data points. 

5. Repeat the calculations in Example 1 using a cubic runout spline 
to interpolate the five data points. 

6. Consider the five points (0, 0), (.5, 1), (1, 0), (1.5, —1), and 
(2, 0) on the graph of y = sin(jrx). 

(a) Use a natural spline to interpolate the data points (0, 0), 
(.5, 1), and (1,0). 

(b) Use a natural spline to interpolate the data points (.5, 1), 
(1,0), and (1.5, -1). 

(c) Explain the unusual nature of your result in part (b). 

7. (The Periodic Spline) If it is known or if it is desired that the n 
points (xi , Vi), (x 2 , y 2 ), • • • , ( x n , y „ ) to be interpolated lie on 
a single cycle of a periodic curve with period x n — X\ , then an 
interpolating cubic spline Six) must satisfy 

5Cn) = S(x„) 

S'(x i) = S\x n ) 

S'\x 0 = S"(x„) 

(a) Show that these three periodicity conditions require that 

yi = y„ 

Mi = M„ 

4 Mi + M 2 + M„_, = 6(y„_i - 2vi + y 2 )/h 2 


The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mcithematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. In the solution of the natural cubic spline problem, it is nec- 
essary to solve a system of equations having coefficient matrix 


‘4 

1 

0 

... 0 

0 

0" 

1 

4 

1 

... 0 

0 

0 

0 

0 

0 

... 1 

4 

1 

_0 

0 

0 

... 0 

1 

4_ 


If we can present a formula for the inverse of this matrix, then 
the solution for the natural cubic spline problem can be easily ob- 
tained. In this exercise and the next, we use a computer to discover 
this formula. Toward this end, we first determine an expression 
for the determinant of A „ , denoted by the symbol D„ . Given that 


A i = [4] and A 2 


'4 1' 

1 4 


we see that 


Di = det(Ai) = det[4] = 4 


and 


Di = det(A 2 ) = det 


'4 

1 


r 

4 


15 


(a) Use the cofactor expansion of determinants to show that 


(b) Using the three equations in part (a) and Equations (15), d„ = 4D„_ 1 — D„^ 2 

construct an (n — 1) x (n — 1) linear system for 

Mi , M 2 , . . . , M„_i in matrix form. for n = 3,4,5,.... This says, for example, that 


8. (The Clamped Spline) Suppose that, in addition to the n points 
to be interpolated, we are given specific values y[ and y' n for the 
slopes S'Gi) and S'(x„) of the interpolating cubic spline at the 
endpoints Ay andA„. 

(a) Show that 

2Mi + M 2 = 6(y 2 - yi - hy[)/h 2 
2M n + M„_! = 6(y„_i - y n + hy' n )/h 2 

(b) Using the equations in part (a) and Equations (15), con- 
struct an n x n linear system for M\ , M 2 , . . . , M n in matrix 
form. 

Remark The clamped spline described in this exercise is 
the most accurate type of spline for interpolation work if 
the slopes at the endpoints are known or can be estimated. 


D 3 = 4D 2 - D i= 4(15) - 4 = 56 
D 4 = 4£>3 - D 2 = 4(56) - 15 = 209 

and so on. Using a computer, check this result for 5 <n< 10. 
(b) By writing 

D„ = 4D„_i — Z)„-2 

and the identity, D„_i = D n _\, in matrix form. 


’ D„ ' 


_ 4 

-l" 

D n - 1 

D„- i 


1 

0 

D n - 2 


show that 


' D„ ' 


4 -f 

n— 2 

'D{ 


4 

-r 

n—2 

'15" 

_Ai- 1 . 


.1 °. 


Pi. 


_1 

o_ 


_ 4_ 
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(c) Use the methods in Section 5.2 and a computer to show that 

'(2 + V 3 - (2 - V 3 y - 1 (2 - V 3 )"“ 2 - (2 + V 3 )"“ 2_ 
( 2+^3 ) n ~ 2 - (2 - V 3 ) n ~ 2 (2 - V 3 )”- 3 - (2 + V 3 )"~ 3 


[: -:r 


and hence 


D„ = 


2V3 

(2 + V3) ,,+1 - (2 — \/3 )" +1 
2a/3 


for n = 1. 2, 3, ... . 

(d) Using a computer, check this result for 1 < n < 10. 

T2. In this exercise, we determine a formula for calculating A~ l 
from D k for k = 0, 1, 2, 3, . . . , n, assuming that D 0 is defined to 
be 1. 


(a) Use a computer to compute A k 1 for k = 1, 2, 3, 4, and 5. 

(b) From your results in part (a), discover the conjecture that 

V = [«y] 


where = ajt and 


«y = (-D 


t+j 


D n ~jDj 

o7" 


for i < j. 

(c) Use the result in part (b) to compute Af and compare it to 
the result obtained using the computer. 


10.4 Markov Chains 

In this section we describe a general model of a system that changes from state to state. We 
then apply the model to several concrete problems. 

Linear Systems 

Matrices 

Intuitive Understanding of Limits 

A Markov Process Suppose a physical or mathematical system undergoes a process of change such that at 
any moment it can occupy one of a finite number of states. For example, the weather 
in a certain city could be in one of three possible states: sunny, cloudy, or rainy. Or 
an individual could be in one of four possible emotional states: happy, sad, angry, or 
apprehensive. Suppose that such a system changes with time from one state to another 
and at scheduled times the state of the system is observed. If the state of the system 
at any observation cannot be predicted with certainty, but the probability that a given 
state occurs can be predicted by just knowing the state of the system at the preceding 
observation, then the process of change is called a Markov chain or Markov process. 


DEFINITION 1 If a Markov chain has k possible states, which we label as 1, 2 , ,k, 
then the probability that the system is in state i at any observation after it was in state j 
at the preceding observation is denoted by ptj and is called the transition probability 
from state j to state i. The matrix P — [pij] is called the transition matrix of the 
Markov chain. 


For example, in a three-state Markov chain, the transition matrix has the form 


Preceding 

State 


1 

2 

3 


Pn 

Pu 

Pn 

1 

Pn 

P 22 

P 23 

2 

Pn 

P32 

P33 

3 


In this matrix, /? 32 is the probability that the system will change from state 2 to state 3, 
p n is the probability that the system will still be in state 1 if it was previously in state 1, 
and so forth. 



552 Chapter 10 Applications of Linear Algebra 

► EXAMPLE 1 Transition Matrix of the Markov Chain 

A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may 
rent a car from any of the three locations and return the car to any of the three locations. 
The manager finds that customers return the cars to the various locations according to 
the following probabilities: 


Rented from Location 
12 3 

”.8 .3 .2 

.1 .2 .6 

.1 .5 .2 

This matrix is the transition matrix of the system considered as a Markov chain. From 
this matrix, the probability is .6 that a car rented from location 3 will be returned to 
location 2, the probability is .8 that a car rented from location 1 will be returned to 
location 1, and so forth. 


1 Returned 

2 to 

3 Location 


► EXAMPLE 2 Transition Matrix of the Markov Chain 

By reviewing its donation records, the alumni office of a college finds that 80% of its 
alumni who contribute to the annual fund one year will also contribute the next year, 
and 30% of those who do not contribute one year will contribute the next. This can be 
viewed as a Markov chain with two states: state 1 corresponds to an alumnus giving a 
donation in any one year, and state 2 corresponds to the alumnus not giving a donation 
in that year. The transition matrix is 


P = 


.8 

.2 


.3' 

.7 


◄ 


In the examples above, the transition matrices of the Markov chains have the property 
that the entries in any column sum to 1. This is not accidental. If P — [p /; ] is the 
transition matrix of any Markov chain with k states, then for each j we must have 


Pij + Pij H Pkj = 1 (1) 

because if the system is in state j at one observation, it is certain to be in one of the k 
possible states at the next observation. 

A matrix with property (1) is called a stochastic matrix , a probability matrix , or a 
Markov matrix. From the preceding discussion, it follows that the transition matrix for 
a Markov chain must be a stochastic matrix. 

In a Markov chain, the state of the system at any observation time cannot generally 
be determined with certainty. The best one can usually do is specify probabilities for 
each of the possible states. For example, in a Markov chain with three states, we might 
describe the possible state of the system at some observation time by a column vector 


x\ 


X2 


X3 


in which X\ is the probability that the system is in state 1 , X 2 the probability that it is 
in state 2, and x 3 the probability that it is in state 3. In general we make the following 
definition. 
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DEFINITION 2 The state vector for an observation of a Markov chain with k states is 
a column vector x whose z th component Xj is the probability that the system is in the 
zth state at that time. 


Observe that the entries in any state vector for a Markov chain are nonnegative and have 
a sum of 1. (Why?) A column vector that has this property is called a probability vector. 

Let us suppose now that we know the state vector x® for a Markov chain at some 
initial observation. The following theorem will enable us to determine the state vectors 


at the subsequent observation times. 


T IfP is the transition matrix of a Markov chain and x (n> is the state 
vector at the nth observation, then x ( " +I) = Px®. 


The proof of this theorem involves ideas from probability theory and will not be given 
here. From this theorem, it follows that 

X (D = p x ( o) 

X® = Px® = P 2 X® 

X® = Px® = P 3 X® 

x (n) _ p x (n-l) _ p« x (0) 

In this way, the initial state vector x (0) and the transition matrix P determine x (n> for 


EXAMPLE 3 Example 2 Revisited 

The transition matrix in Example 2 was 


We now construct the probable future donation record of a new graduate who did not 
give a donation in the initial year after graduation. For such a graduate the system is 
initially in state 2 with certainty, so the initial state vector is 


x 


(°) 


: 


From Theorem 10.4.1 we then have 


x® = Px (0) = 
x® = Px® = 
X® = Px® = 


".8 .3' 

'O' 


'.3' 


1 

k> 

<1 

_ 1 _ 


_.7_ 


'.8 .3' 

'.3' 


'.45' 

.2 .7 

.7 


.55 


8 

.3' 

'.45' 


'.525' 

2 

.7 

.55 


.475 
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Thus, after three years the alumnus can be expected to make a donation with probability 
.525. Beyond three years, we find the following state vectors (to three decimal places): 


II 

'.563' 

_.438_ 

II 

'.581' 

.419 

II 

".591" 

.■409. 

, x< 7 > = 

'.595' 

.405. 

II 

oo 

'.598' 

_.402_ 

, x< 9 > = 

'.599' 

_.40!_ 

, x (1 °) = 

'.599' 

_.40!_ 

, x< n > = 

'.600' 

.400. 


For all n beyond 1 1 , we have 


x 


(«) _ 


'.600' 

.400 


to three decimal places. In other words, the state vectors converge to a fixed vector as 
the number of observations increases. (We will discuss this further below.) 


► EXAMPLE 4 Example 1 Revisited 

The transition matrix in Example 1 was 

”.8 .3 .2 
.1 .2 .6 
.1 .5 .2 


If a car is rented initially from location 2, then the initial state vector is 


x 


( 0 ) 


0 

1 

0 


Using this vector and Theorem 10.4.1, one obtains the later state vectors listed in Table 1 . 


Table 1 


N. It 

xW\. 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

xf 

0 

.300 

.400 

.477 

.511 

.533 

.544 

.550 

.553 

.555 

.556 

.557 

x%> 

1 

.200 

.370 

.252 

.261 

.240 

.238 

.233 

.232 

.231 

.230 

.230 

4 n) 

0 

.500 

.230 

.271 

.228 

.227 

.219 

.217 

.215 

.214 

.214 

.213 


For all values of n greater than 11, all state vectors are equal to x (11) to three decimal 
places. 

Two things should be observed in this example. First, it was not necessary to know 
how long a customer kept the car. That is, in a Markov process the time period between 
observations need not be regular. Second, the state vectors approach a fixed vector as n 
increases, just as in the first example. 


► EXAMPLE 5 Using Theorem 10.4.1 

A traffic officer is assigned to control the traffic at the eight intersections indicated in 
Figure 10. 4.1. She is instructed to remain at each intersection for an hour and then to 
either remain at the same intersection or move to a neighboring intersection. To avoid 
establishing a pattern, she is told to choose her new intersection on a random basis, 
with each possible choice equally likely. For example, if she is at intersection 5, her next 


10.4 Markov Chains 555 



6 7 8 

n i if i r 

▲ Figure 10.4.1 


intersection can be 2, 4, 5, or 8, each with probability Every day she starts at the 
location where she stopped the day before. The transition matrix for this Markov chain 
is 




Old 

Intersection 





1 

2 

3 

4 

5 

6 

7 

8 



1 

3 

1 

3 

0 

1 

5 

0 

0 

0 

0 

i 


1 

3 

1 

3 

0 

0 

1 

4 

0 

0 

0 

2 


0 

0 

1 

3 

1 

5 

0 

1 

3 

0 

0 

3 


1 

3 

0 

1 

3 

1 

5 

1 

4 

0 

l 

4 

0 

4 

New 

0 

1 

3 

0 

1 

5 

1 

4 

0 

0 

1 

3 

5 

Intersection 

0 

0 

1 

3 

0 

0 

1 

3 

1 

4 

0 

6 


0 

0 

0 

1 

5 

0 

1 

3 

1 

4 

1 

3 

7 


0 

0 

0 

0 

1 

4 

0 

1 

4 

1 

3 

8 



If the traffic officer begins at intersection 5, her probable locations, hour by hour, are 
given by the state vectors given in Table 2. For all values of n greater than 22, all state 
vectors are equal to x (22) to three decimal places. Thus, as with the first two examples, 
the state vectors approach a fixed vector as n increases. 


Table 2 


It 

0 

1 

2 

3 

4 

5 

10 

15 

20 

22 


0 

.000 

.133 

.116 

.130 

.123 

.113 

.109 

.108 

.107 


0 

.250 

.146 

.163 

.140 

.138 

.115 

.109 

.108 

.107 


0 

.000 

.050 

.039 

.067 

.073 

TOO 

.106 

.107 

.107 

4 ° 

0 

.250 

.113 

.187 

.162 

.178 

.178 

.179 

.179 

.179 

>) 

x 5 

1 

.250 

.279 

.190 

.190 

.168 

.149 

.144 

.143 

.143 

y(n) 

X 6 

0 

.000 

.000 

.050 

.056 

.074 

.099 

.105 

.107 

.107 

A -(n) 

0 

.000 

.133 

.104 

.131 

.125 

.138 

.142 

.143 

.143 

>) 

a 8 

0 

.250 

.146 

.152 

.124 

.121 

.108 

.107 

.107 

.107 


Limiting Behavior of the 
State Vectors 


In our examples we saw that the state vectors approached some fixed vector as the number 
of observations increased. We now ask whether the state vectors always approach a fixed 
vector in a Markov chain. A simple example shows that this is not the case. 


► EXAMPLE 6 System Oscillates Between Two State Vectors 

Let 



'o r 

and x (0) = 

T 

P = 


1 0 


0 


Then, because P 2 = I and P 1 ' — P, we have that 
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and 


x<‘> = x® = x (5 > 


This system oscillates indefinitely between the two state vectors 
not approach any fixed vector. 


and 


, so it does 


However, if we impose a mild condition on the transition matrix, we can show that 
a fixed limiting state vector is approached. This condition is described by the following 
definition. 


DEFINITION 3 A transition matrix is regular if some integer power of it has all positive 
entries. 


Thus, for a regular transition matrix P, there is some positive integer m such that all 
entries of P m are positive. This is the case with the transition matrices of Examples 1 and 
2 for m = 1 . In Example 5 it turns out that P 4 has all positive entries. Consequently, in 
all three examples the transition matrices are regular. 

A Markov chain that is governed by a regular transition matrix is called a regular 
Markov chain. We will see that every regular Markov chain has a fixed state vector q such 
that P' ! x (0) approaches q as n increases for any choice of x (0) . This result is of major 
importance in the theory of Markov chains. It is based on the following theorem. 


Behavior of P n as n -+ oo 

If P is a regular transition matrix, then as n — ► oo, 


4\ 

4\ ' 

■ ■ q\ 

42 

4i • 

■ ■ qi 

4k 

qk ■ 

■ ■ qk 


where the <y, are positive numbers such that q\ + q 2 + ■ ■ ■ + q k = 1. 


We will not prove this theorem here. We refer you to a more specialized text, such as 
J. Kemeny and J. Snell, Finite Markov Chains (New York: Springer- Verlag, 1976). 

Let us set 



~4i 

4\ ■ 

■ 4\ 


~4\ 

Q = 

42 

42 ■ 

■ qi 

and q = 

42 


4k 

4k ■ 

■ 4k_ 


_4k_ 


Thus, Q is a transition matrix, all of whose columns are equal to the probability vector 
q. Q has the property that if x is any probability vector, then 


4\ 

4\ ' 

■ 4i~ 


X\ 


q\x\ 

+ 4\ x 2 + • 

• + q\x k 

42 

q 2 • 

■ q2 


x 2 

= 

q 2 x\ 

+ q 2 X2 + ■ 

■ + q 2 Xk 

4k 

qk ■ 

■ 4k_ 


_x k _ 


_qkX\ 

+ 4kX 2 + • 

■ + 4kX k _ 


q\ 


Oi + x 2 H b xf) 


42 


d)q = q 


4k 


Gx = 
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That is, Q transforms any probability vector x into the fixed probability vector q. This 
result leads to the following theorem. 


Behavior of P n x as n -► oo 

If P is a regular transition matrix and x is any probability vector, then as n — > oo, 


<h 



J9k_ 

where q is a fixed probability vector, independent of n, all of whose entries are positive. 


This result holds since Theorem 10.4.2 implies that P" — » Q as n — > oo. This in turn 
implies that P"x -* Qx — q as n — >■ oo. Thus, for a regular Markov chain, the system 
eventually approaches a fixed state vector q. The vector q is called the steady-state vector 
of the regular Markov chain. 

For systems with many states, usually the most efficient technique of computing the 
steady-state vector q is simply to calculate P n x for some large n. Our examples illustrate 
this procedure. Each is a regular Markov process, so that convergence to a steady-state 
vector is ensured. Another way of computing the steady-state vector is to make use of 
the following theorem. 


Steady-State Vector 

The steady-state vector q of a regular transition matrix P is the unique probability 
vector that satisfies the equation P q = q. 


To see this, consider the matrix identity PP" = P n+l . By Theorem 10.4.2, both P" and 
P" +1 approach Q as n — > oo. Thus, we have PQ = Q. Any one column of this matrix 
equation gives Pq = q. To show that q is the only probability vector that satisfies this 
equation, suppose r is another probability vector such that Pr = r. Then also P n r = r 
for n = 1,2,.... When we let n -* oo, Theorem 10.4.3 leads to q = r. 

Theorem 10.4.4 can also be expressed by the statement that the homogeneous linear 
system 

(/ - P ) q = 0 

has a unique solution vector q with nonnegative entries that satisfy the condition <71 + 

qi + b qk = 1 . We can apply this technique to the computation of the steady-state 

vectors for our examples. 


EXAMPLE 7 Example 2 Revisited 

In Example 2 the transition matrix was 



V 


'O' 


. 12 . 


_ 0 _ 


so the linear system (7 — P)q = 0 is 

' .2 
-.2 


( 2 ) 
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This leads to the single independent equation 

.2(7i - .3q 2 = 0 



where s is an arbitrary constant. To make the vector q a probability vector, we set 
j = 1/(1. 5 + 1) = .4. Consequently, 


. 6 ' 


is the steady-state vector of this regular Markov chain. This means that over the long 
run, 60% of the alumni will give a donation in any one year, and 40% will not. Observe 
that this agrees with the result obtained numerically in Example 3. 

► EXAMPLE 8 Example 1 Revisited 

In Example 1 the transition matrix was 


.8 .3 .2 


P = .1 .2 .6 

.1 .5 .2 


so the linear system (/ — P ) q = 0 is 


.2 -.3 -.2 qi 

-.1 .8 -.6 q 2 

-.1 -.5 .8 q 3 


0 

0 

0 


The reduced row echelon form of the coefficient matrix is (verify) 


0 1 
0 0 


0 0 


so the original linear system is equivalent to the system 


qi = (ff)<73 
<72 = (if)<73 


When we set qi = s, any solution of the linear system is of the form 


34 

13 


To make this a probability vector, we set 


34 

13 



13 

6l 


s = 
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Thus, the steady-state vector of the system is 


34 “ 

61 


".5573..." 

14 

61 

= 

.2295 . . . 

13 

61 


.2131... 


This agrees with the result obtained numerically in Table 1 . The entries of q give the 
long-run probabilities that any one car will be returned to location 1, 2, or 3, respectively. 
If the car rental agency has a fleet of 1000 cars, it should design its facilities so that there 
are at least 558 spaces at location 1, at least 230 spaces at location 2, and at least 214 
spaces at location 3. 

I EXAMPLE 9 Example 5 Revisited 

We will not give the details of the calculations but simply state that the unique probability 
vector solution of the linear system (7 — 7 > )q = 0 is 


i 


M071...~ 

3 

28 


.1071... 

3 

28 


.1071... 

5 

28 


.1785... 

4 

28 


.1428... 

3 

28 


.1071... 

4 

28 


.1428... 

3 

_ 28 _ 


_. 1071 . . ._ 


The entries in this vector indicate the proportion of time the traffic officer spends at 
each intersection over the long term. Thus, if the objective is for her to spend the same 
proportion of time at each intersection, then the strategy of random movement with equal 
probabilities from one intersection to another is not a good one. (See Exercise 5.) M 


Exercise Set 10.4 

1. Consider the transition matrix 
P = 


.4 .5' 

.6 .5 


(a) Calculate x (n) for n = 1, 2, 3. 4, 5 if x (0) = 

(b) State why P is regular and find its steady-state vector. 
2. Consider the transition matrix 


(b) State why P is regular and find its steady-state vector. 

3. Find the steady-state vectors of the following regular transition 
matrices: 


(a) 


' 1 3 " 

3 4 


'.81 

.26' 


"l 

3 

1 

2 

0 

2 1 

(b) 

.19 

.74 

(c) 

i 0 

1 

4 

_ 3 4 _ 





1 

_ 3 

1 

2 

3 

4 _ 


4. Let P be the transition matrix 



".2 .1 .7" 


r 1 r\~\ 

p = 

.6 .4 .2 


2 0 




I 1 


.2 .5 .1 


L 2 1 J 


(a) Calculate x (1) , x (2) , and x® to three decimal places if 


(a) Show that P is not regular. 

(b) Show that as n increases, P"x (0) approaches 
initial state vector x® . 


for any 


(c) What conclusion of Theorem 10.4.3 is not valid for the 
steady state of this transition matrix? 
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5. Verify that if P is a k x k regular transition matrix all of whose 
row sums are equal to 1, then the entries of its steady-state 
vector are all equal to l/k. 

6. Show that the transition matrix 



is regular, and use Exercise 5 to find its steady-state vector. 

7. John is either happy or sad. If he is happy one day, then he is 
happy the next day four times out of five. If he is sad one day, 
then he is sad the next day one time out of three. Over the long 
term, what are the chances that John is happy on any given day? 

8. A country is divided into three demographic regions. It is found 
that each year 5% of the residents of region 1 move to region 2, 
and 5% move to region 3 . Of the residents of region 2,15% move 
to region 1 and 10% move to region 3. And of the residents of 
region 3, 10% move to region 1 and 5% move to region 2. What 
percentage of the population resides in each of the three regions 
after a long period of time? 

Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. Consider the sequence of transition matrices 
{P 2 ,P,,P 4 ,...} 


with 


Pi = 


Pa = 


and so on. 


Pi = 







0 

0 

0 

0 

1 

0 

0 

0 

1 






5 

0 

0 

0 

1 

1 

3 

1 

4 

1 

4 

1 

• P5 = 

0 

0 

0 

0 

0 

1 

3 

1 

4 

1 

4 

1 

5 

1 

5 

2 

3 

4 


0 

1 

1 

1 

1 

1 

1 

1 

1 


2 

3 

4 

5 

2 

3 

4 _ 


1 

1 

2 

1 

3 

1 

4 

1 

5_ 



(a) Use a computer to show that each of these four matrices is 
regular by computing their squares. 

(b) Verify Theorem 10.4.2 by computing the 100th power of Pk 

for k = 2, 3, 4, 5. Then make a conjecture as to the limiting 
value of Pk as n — > oo for all k = 2, 3, 4 

(c) Verify that the common column q t . of the limiting matrix you 
found in part (b) satisfies the equation Pk<\k — as required 
by Theorem 10.4.4. 


T2. A mouse is placed in a box with nine rooms as shown in the ac- 
companying figure. Assume that it is equally likely that the mouse 
goes through any door in the room or stays in the room. 

(a) Construct the 9x9 transition matrix for this problem and 
show that it is regular. 

(b) Determine the steady-state vector for the matrix. 

(c) Use a symmetry argument to show that this problem may be 
solved using only a 3 x 3 matrix. 



Figure Ex-T2 
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10.5 


Relations Among Members 
of a Set 


Directed Graphs 



A Figure 10.5.1 


► Figure 10.5.2 


GraphTheory 

In this section we introduce matrix representations of relations among members of a set. 
We use matrix arithmetic to analyze these relationships. 


Matrix Addition and Multiplication 


There are countless examples of sets with finitely many members in which some relation 
exists among members of the set. For example, the set could consist of a collection of 
people, animals, countries, companies, sports teams, or cities; and the relation between 
two members, A and B, of such a set could be that person A dominates person B, 
animal A feeds on animal B , country A militarily supports country B , company A sells 
its product to company B , sports team A consistently beats sports team B , or city A has 
a direct airline flight to city B . 

We will now show how the theory of directed graphs can be used to mathematically 
model relations such as those in the preceding examples. 


A directed graph is a finite set of elements, { P\ , Pi , ■ . . , P n } , together with a finite collec- 
tion of ordered pairs (p, Pj) of distinct elements of this set, with no ordered pair being 
repeated. The elements of the set are called vertices, and the ordered pairs are called 
directed edges, of the directed graph. We use the notation P, —> Pj (which is read “ P, 
is connected to Pj ”) to indicate that the directed edge (P, , Pj) belongs to the directed 
graph. Geometrically, we can visualize a directed graph (Figure 10.5.1) by representing 
the vertices as points in the plane and representing the directed edge P, — * Pj by drawing 
a line or arc from vertex P; to vertex Pj, with an arrow pointing from P, to Pj. If both 
Pi -» Pj and Pj —*■ Pi hold (denoted P, -o- Pj), we draw a single line between P,- and 
Pj with two oppositely pointing arrows (as with P 2 and P 3 in the figure). 

As in Figure 10.5.1, for example, a directed graph may have separate “components” 
of vertices that are connected only among themselves; and some vertices, such as P 5 , 
may not be connected with any other vertex. Also, because P, — »■ P, is not permitted in 
a directed graph, a vertex cannot be connected with itself by a single arc that does not 
pass through any other vertex. 

Figure 10.5.2 shows diagrams representing three more examples of directed graphs. 
With a directed graph having n vertices, we may associate an n x n matrix M = [mg], 
called the vertex matrix of the directed graph. Its elements are defined by 
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for i, j = 1,2, ... ,n. For the three directed graphs in Figure 10.5.2, the corresponding 
vertex matrices are 


Figure 10.5.2a: 


Figure 10.5.26: 


Figure 10.5.2c: 


M = 


M 


M 


0 1 n () 

0 0 10 

0 10 1 

0 0 0 0 

' 01001 ' 
0 0 110 

0 0 0 1 0 

0 10 0 1 

0 110 0 


0 10 0 
10 10 
10 0 1 
10 0 0 



▲ Figure 10.5.3 


By their definition, vertex matrices have the following two properties: 

(i) All entries are either 0 or 1 . 

(ii) All diagonal entries are 0. 

Conversely, any matrix with these two properties determines a unique directed graph 
having the given matrix as its vertex matrix. For example, the matrix 

"0 110 " 

0 0 10 

M = 

10 0 1 
0 0 0 0 

determines the directed graph in Figure 10.5.3. 


M YS 



> EXAMPLE 1 Influences Within a Family 


A certain family consists of a mother, father, daughter, and two sons. The family members 
have influence, or power, over each other in the following ways: the mother can influence 
the daughter and the oldest son; the father can influence the two sons; the daughter can 
influence the father; the oldest son can influence the youngest son; and the youngest son 
can influence the mother. We may model this family influence pattern with a directed 
graph whose vertices are the five family members. If family member A influences family 
member B, we write A — > B. Figure 10.5.4 is the resulting directed graph, where we 
have used obvious letter designations for the five family members. The vertex matrix of 
this directed graph is 


M 

F 

D 

OS 

YS 


M 

'0 

0 

0 

0 

1 


F 

0 

0 

1 

0 

0 


D 

1 

0 

0 

0 

0 


OS YS 

1 0 " 

1 1 

0 0 
0 1 
0 0 
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▲ Figure 10.5.5 


i 

2 

3 

4 

5 

6 

7 

8 

9 


A Figure 10.5.6 
1 2 3 



▲ Figure 10.5.7 


8 



▲ Figure 10.5.8 


EXAMPLE 2 Vertex Matrix: Moves on a Chessboard 

In chess the knight moves in an “L”-shaped pattern about the chessboard. For the 
board in Figure 10.5.5 it may move horizontally two squares and then vertically one 
square, or it may move vertically two squares and then horizontally one square. Thus, 
from the center square in the figure, the knight may move to any of the eight marked 
shaded squares. Suppose that the knight is restricted to the nine numbered squares in 
Figure 10.5.6. If by i j we mean that the knight may move from square i to square 
j, the directed graph in Figure 10.5.7 illustrates all possible moves that the knight may 
make among these nine squares. In Figure 10.5.8 we have “unraveled” Figure 10.5.7 to 
make the pattern of possible moves clearer. 


The vertex matrix of this directed 

graph 

is given 

by 




'0 

0 

0 

0 

0 

1 

0 

1 

0 


0 

0 

0 

0 

0 

0 

1 

0 

1 


0 

0 

0 

1 

0 

0 

0 

1 

0 


0 

0 

1 

0 

0 

0 

0 

0 

1 

M = 

0 

0 

0 

0 

0 

0 

0 

0 

0 


1 

0 

0 

0 

0 

0 

1 

0 

0 


0 

1 

0 

0 

0 

1 

0 

0 

0 


1 

0 

1 

0 

0 

0 

0 

0 

0 


0 

1 

0 

1 

0 

0 

0 

0 

0 


In Example 1 the father cannot directly influence the mother; that is, F -* M is not 
true. But he can influence the youngest son, who can then influence the mother. We write 
this as F YS M and call it a 2-step connection from F to M. Analogously, we call 
M — > D a 1-step connection , F — > OS — > YS —> M a 3-step connection, and so forth. 
Let us now consider a technique for finding the number of all possible r-step connections 
(r = 1,2,...) from one vertex P, to another vertex Pj of an arbitrary directed graph. 
(This will include the case when P,- and Pj are the same vertex.) The number of 1-step 
connections from P, to Pj is simply m,j. That is, there is either zero or one 1-step 
connection from P ;• to Pj, depending on whether nijj is zero or one. For the number 
of 2-step connections, we consider the square of the vertex matrix. If we let mf^ be the 
(i, y)-th element of M 2 , we have 

= m n m xj + m i2 m 2 j H F m in m nj (1) 

Now,if/??;i = m i j = 1, there is a 2-step connection P ; — »■ Pi — > Pj from P, to Pj . Butif 
either m, i or m i j is zero, such a 2-step connection is not possible. Thus P, —> P\ -> Pj 
is a 2-step connection if and only if mnmij — 1. Similarly, for any k — 1,2 
Pi —*■ Pk -> Pj is a 2-step connection from P,- to Pj if and only if the term m, on 
the right side of (1) is one; otherwise, the term is zero. Thus, the right side of (1) is the 
total number of two 2-step connections from P,- to Pj. 

A similar argument will work for finding the number of 3-, 4-, , r -step connections 
from Pi to Pj . In general, we have the following result. 


!EM 10.5 Let M be the vertex matrix of a directed graph and let mh be the 


(f) 

( i , j)-th element of M r . Then m\3 is equal to the number of r-step connections from 
Pi to Pj. 
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Pa 

▲ Figure 10.5.9 


► EXAMPLE 3 Using Theorem 10.5.1 

Figure 10.5.9 is the route map of a small airline that services the four cities P \ , P 2 , Pi, 
P 4 . Asa directed graph, its vertex matrix is 


M = 


We have that 


"2 

0 

1 

f 


'1 

3 

3 

f 

1 

1 

1 

1 

and M 3 = 

2 

2 

3 

1 

0 

2 

2 

0 

4 

0 

2 

2 

2 

0 

1 

1 


1 

3 

3 

1 


M = 


If we are interested in connections from city P 4 to city P 3 , we may use Theorem 10.5.1 to 

(2) 

find their number. Because m 43 = 1, there is one 1-step connection; because m 43 = 1, 
there is one 2-step connection; and because m 43 — 3, there are three 3-step connections. 
To verify this, from Figure 10.5.9 we find 

1 -step connections from P 4 to P 3 : 

2- step connections from P 4 to P 3 : 

3- step connections from P 4 to P 3 : 


P4-+ Pi 
Pa -* Pi 
Pa —*■ Pi 
Pa Pi 
Pa -* Pi 


Pi 

Pa~> Pi 
P\ —■ > Pi 
Pi -> Pi 


Cliques In everyday language a “clique” is a closely knit group of people (usually three or more) 
that tends to communicate within itself and has no place for outsiders. In graph theory 
this concept is given a more precise meaning. 



DEFINITION 1 A subset of a directed graph is called a clique if it satisfies the following 
three conditions: 

(i) The subset contains at least three vertices. 

(ii) For each pair of vertices P, and P, in the subset, both P, -» Pj and P, — > P,- 
are true. 

(iii) The subset is as large as possible; that is, it is not possible to add another vertex 
to the subset and still satisfy condition (ii). 


This definition suggests that cliques are maximal subsets that are in perfect “commu- 
nication” with each other. For example, if the vertices represent cities, and Pi -» Pj 
means that there is a direct airline flight from city P, to city Pj , then there is a direct 
flight between any two cities within a clique in either direction. 

EXAMPLE 4 A Directed Graph with Two Cliques 

The directed graph illustrated in Figure 10.5.10 (which might represent the route map 
of an airline) has two cliques: 

{Pi,P 2 ,P 3 ,P 4 } and {P 3 ,P 4 , P 6 } 

This example shows that a directed graph may contain several cliques and that a vertex 
may simultaneously belong to more than one clique. 
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(«) 



P 3 

(b) 


A Figure 10.5.11 


For simple directed graphs, cliques can be found by inspection. But for large directed 
graphs, it would be desirable to have a systematic procedure for detecting cliques. For 
this purpose, it will be helpful to define a matrix S — [s,y] related to a given directed 
graph as follows: 


1, if Pi 4+ Pj 
0, otherwise 


The matrix S determines a directed graph that is the same as the given directed graph, 
with the exception that the directed edges with only one arrow are deleted. For example, 
if the original directed graph is given by Figure 10.5.1 la, the directed graph that has S 
as its vertex matrix is given in Figure 10.5.1 lb. The matrix S may be obtained from the 
vertex matrix M of the original directed graph by setting .s ;/ - = 1 if m,j = m ,-,- = 1 and 
setting Sij — 0 otherwise. 

The following theorem, which uses the matrix S, is helpful for identifying cliques. 


Identifying Cliques 

Let s]y be the (i, j)-th element of S'. Then a vertex Pi belongs to some clique if and 
only if s\f / 0. 

Proof If s^p ^ 0, then there is at least one 3-step connection from P, to itself in the 
modified directed graph determined by S. Suppose it is P, Pj —> P \ —> P,. In the 
modified directed graph, all directed relations are two-way, so we also have the connec- 
tions Pi 4* Pj Pk ■ o- Pi- But this means that { P, , Pj, Pk] is either a clique or a subset 
of a clique. In either case, P , must belong to some clique. The converse statement, “if 
Pi belongs to a clique, then s-p ^ 0,” follows in a similar manner. 


[> EXAMPLE 5 UsingTheorem 10.5.2 

Suppose that a directed graph has as its vertex matrix 


M = 


Then 


"o 

1 

0 

f 


'0 

3 

0 

2" 

1 

0 

1 

0 

and S 3 = 

3 

0 

2 

0 

0 

1 

0 

0 

0 

2 

0 

1 

1 

0 

0 

0 


2 

0 

1 

0 


S = 


Because all diagonal entries of S 3 are zero, it follows from Theorem 10.5.2 that the 
directed graph has no cliques. 


EXAMPLE 6 UsingTheorem 10.5.2 

Suppose that a directed graph has as its vertex matrix 


fo 


M = 


1 

1 

1 

1 


1 0 
0 0 
1 0 
1 0 
0 0 


i r 
1 0 
1 0 
0 0 
1 0 
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Dominance-Directed 

Graphs 

Pi 



A Figure 10.5.12 


Then 


"0 

1 

0 

1 

r 


'2 

4 

0 

4 

3" 

1 

0 

0 

1 

0 


4 

2 

0 

3 

1 

0 

0 

0 

0 

0 

and S 3 = 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 


4 

3 

0 

2 

1 

_1 

0 

0 

0 

o_ 


_3 

1 

0 

1 

0_ 


The nonzero diagonal entries of S 3 are , and s 44 ■ Consequently, in the given 

directed graph, Pi, P 2 , and P4 belong to cliques. Because a clique must contain at least 
three vertices, the directed graph has only one clique, {Pi, P 2 , P4}. 


In many groups of individuals or animals, there is a definite “pecking order” or domi- 
nance relation between any two members of the group. That is, given any two individuals 
A and B, either A dominates B or B dominates A, but not both. In terms of a directed 
graph in which P, —*■ Pj means P, dominates Pj , this means that for all distinct pairs, 
either P, -> P 7 or P 7 —*■ Pi, but not both. In general, we have the following definition. 


DEFINITION 2 A dominance-directed graph is a directed graph such that for any dis- 
tinct pair of vertices P, and P 7 , either P,- — »■ P 7 or P 7 — > P, , but not both. 


An example of a directed graph satisfying this definition is a league of n sports teams 
that play each other exactly one time, as in one round of a round-robin tournament in 
which no ties are allowed. If P, -> P 7 means that team P, beat team P ; in their single 
match, it is easy to see that the definition of a dominance-directed group is satisfied. For 
this reason, dominance-directed graphs are sometimes called tournaments. 

Figure 10.5.12 illustrates some dominance-directed graphs with three, four, and five 
vertices, respectively. In these three graphs, the circled vertices have the following in- 
teresting property: from each one there is either a 1-step or a 2-step connection to any 
other vertex in its graph. In a sports tournament, these vertices would correspond to the 
most “powerful” teams in the sense that these teams either beat any given team or beat 
some other team that beat the given team. We can now state and prove a theorem that 
guarantees that any dominance-directed graph has at least one vertex with this property. 


Connections in Dominance-Directed Graphs 

In any dominance-directed graph, there is at least one vertex from which there is a 1-step 
or 2-step connection to any other vertex. 


Proof Consider a vertex (there may be several) with the largest total number of 1-step 
and 2-step connections to other vertices in the graph. By renumbering the vertices, we 
may assume that Pi is such a vertex. Suppose there is some vertex P, such that there is no 
1 -step or 2-step connection from Pi to P,-. Then, in particular, Pi —> P, is not true, so that 
by definition of a dominance-directed graph, it must be that P,- — »■ P\ . Next, let P k be any 
vertex such that Pi — >• !\ is true. Then we cannot have P k -> P ( , as then Pj —> P k -> P, 
would be a 2-step connection from Pi to P,-. Thus, it must be that P, — > l\. That 
is, P, has 1-step connections to all the vertices to which Pi has 1-step connections. 
The vertex P,- must then also have 2-step connections to all the vertices to which Pi 
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has 2-step connections. But because, in addition, we have that P, — »■ P \ , this means that 
Pi has more 1-step and 2-step connections to other vertices than does Pi . However, this 
contradicts the way in which Pi was chosen. Hence, there can be no vertex P, to which 
Pi has no 1-step or 2-step connection. 

This proof shows that a vertex with the largest total number of 1-step and 2-step 
connections to other vertices has the property stated in the theorem. There is a simple 
way of finding such vertices using the vertex matrix M and its square M 2 . The sum of 
the entries in the r th row of M is the total number of 1 -step connections from P; to other 
vertices, and the sum of the entries of the i th row of M 2 is the total number of 2-step 
connections from p to other vertices. Consequently, the sum of the entries of the i th 
row of the matrix A = M + M 2 is the total number of 1-step and 2-step connections 
from P, to other vertices. In other words, a row of A — M + M 2 with the largest row 
sum identifies a vertex having the property stated in Theorem 10.5.3. 



► EXAMPLE 7 Using Theorem 10.5.3 

Suppose that five baseball teams play each other exactly once, and the results are as 
indicated in the dominance-directed graph of Figure 10.5.13. The vertex matrix of the 
graph is 


M 


n 

1 

o 

0 

1 


0 110 " 
0 10 1 
0 0 10 
10 0 0 
0 110 


so 


"0 

0 

1 

1 

0" 


"0 

1 

0 

1 

0" 


"0 

1 

1 

2 

0" 

1 

0 

1 

0 

1 


1 

0 

2 

3 

0 


2 

0 

3 

3 

1 

0 

0 

0 

1 

0 

+ 

0 

1 

0 

0 

0 

= 

0 

1 

0 

1 

0 

0 

1 

0 

0 

0 


1 

0 

1 

0 

1 


1 

1 

1 

0 

1 

_1 

0 

1 

1 

0_ 


_0 

1 

1 

2 

0_ 


_1 

1 

2 

3 

0_ 


The row sums of A are 

1 st row sum = 4 
2nd row sum = 9 
3rd row sum = 2 
4th row sum = 4 
5th row sum = 7 

Because the second row has the largest row sum, the vertex P 2 must have a 1-step or 
2-step connection to any other vertex. This is easily verified from Figure 10.5.13. M 


We have informally suggested that a vertex with the largest number of 1-step and 2- 
step connections to other vertices is a “powerful” vertex. We can formalize this concept 
with the following definition. 
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DEFINITION 3 The power of a vertex of a dominance-directed graph is the total num- 
ber of 1 -step and 2-step connections from it to other vertices. Alternatively, the power 
of a vertex P \ is the sum of the entries of the ;th row of the matrix A — M + M 2 , 
where M is the vertex matrix of the directed graph. 


EXAMPLE 8 Example 7 Revisited 

Let us rank the five baseball teams in Example 7 according to their powers. From the 
calculations for the row sums in that example, we have 

Power of team Pi = 4 
Power of team Pi = 9 
Power of team P 3 = 2 
Power of team P 4 = 4 
Power of team P 5 = 7 

Hence, the ranking of the teams according to their powers would be 

P 2 (first), P 5 (second), Pi and P 4 (tied for third), P 3 (last) 


Exercise Set 10.5 

1. Construct the vertex matrix for each of the directed graphs 
illustrated in Figure Ex-1. 





A Figure Ex-1 

2. Draw a diagram of the directed graph corresponding to each 
of the following vertex matrices. 







'o 

0 

1 

0 

o' 

‘0 

1 

1 

O' 


1 

0 

0 

0 

1 

1 

0 

0 

0 


0 

0 

0 

1 

(b) 

0 

1 

0 

1 

1 




0 

0 

0 

0 

0 

.1 

0 

1 

0. 












1 

1 

1 

0 

0 


0 10 10 1 

1 0 0 0 1 0 

0 0 0 0 0 0 

(c) 

’ 1 10 0 10 

0 0 0 1 0 1 

_0 1 0 0 1 0_ 

3. Let M be the following vertex matrix of a directed graph: 

‘0 i i r 

10 0 0 

0 10 1 

.0110. 

(a) Draw a diagram of the directed graph. 

(b) Use Theorem 10.5.1 to find the number of 1-, 2-, and 3-step 
connections from the vertex P\ to the vertex Pi. Verify your 
answer by listing the various connections as in Example 3. 

(c) Repeat part (b) for the 1-, 2-, and 3-step connections from 
Pi to P 4 . 

4. (a) Compute the matrix product M T M for the vertex matrix M 

in Example 1 . 

(b) Verify that the &th diagonal entry of M T M is the number 
of family members who influence the kth family member. 
Why is this true? 

(c) Find a similar interpretation for the values of the nondiag- 
onal entries of M T M. 

5. By inspection, locate all cliques in each of the directed graphs 
illustrated in Figure Ex-5. 
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P i 



Pi 




(c) 

▲ Figure Ex-5 


6. For each of the following vertex matrices, use Theorem 10.5.2 
to find all cliques in the corresponding directed graphs. 








'0 

1 

0 

1 

1 

O ' 

'0 

1 

0 

1 

o ' 


1 

0 

1 

0 

1 

1 

1 

0 

1 

0 

1 


0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

1 

( b ) 










1 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 


0 

1 

0 

1 

0 

0 

_1 

0 

1 

1 

0 _ 














.0 

0 

1 

1 

1 

0 _ 


7. For the dominance-directed graph illustrated in Figure Ex-7 
construct the vertex matrix and find the power of each vertex. 


Pi 



M Figure Ex-7 


8. Five baseball teams play each other one time with the following 
results: 

A beats B, C, D 
B beats C, E 
C beats D, E 
D beats B 
E beats A , D 

Rank the five baseball teams in accordance with the powers 
of the vertices they correspond to in the dominance-directed 
graph representing the outcomes of the games. 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathemcitica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. A graph having n vertices such that every vertex is connected 
to every other vertex has a vertex matrix given by 


'0 

1 

1 

1 

1 

r 

1 

0 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

0 

1 

_1 

1 

1 

1 

1 • • • 

0_ 


In this problem we develop a formula for whose (;', y)-th entry 
equals the number of fc-step connections from P, to Pj . 

(a) Use a computer to compute the eight matrices M* for n = 2, 3 
and for k = 2, 3, 4, 5. 

(b) Use the results in part (a) and symmetry arguments to show 
that M* can be written as 


'0 

1 

1 

1 

1 ••• 

1 " 

1 

0 

1 

1 

1 ••• 

1 

1 

1 

0 

1 

1 ••• 

1 

1 

1 

1 

0 

1 ••• 

1 

1 

1 

1 

1 

0 ••• 

1 

_1 

1 

1 

1 

1 ••• 

0 _ 


a k 

Pk 

Pk 

Pk 

Pk ■ 

• Pk - 

Pk 

0>k 

Pk 

Pk 

Pk ■ 

• Pk 

Pk 

Pk 

oik 

Pk 

Pk ■ 

■ Pk 

Pk 

Pk 

Pk 

oik 

Pk ■ 

■ Pk 

Pk 

Pk 

Pk 

Pk 

otk ■ 

■ Pk 

.Pk 

Pk 

Pk 

Pk 

Pk ■ 

■ Olk_ 


(c) Using the fact that M* = M n M\ 1 , show that 


'oik' 


'0 

n — 1 

'oik- 1 

.Pk. 


_1 

n — 2 

.Pk- 1. 


with 


'ot{ 


'O' 

.Pi. 


_ 1 _ 
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(d) Using part (c), show that 


"a*" 


'0 

n — 1 

k- 1 

'0" 

A. 


_i 

n — 2 


_i_ 


(e) Use the methods of Section 5.2 to compute 

ro n‘-> 

0 re — 1 

1 re - 2 


and thereby obtain expressions for a * and f$k, and eventually 
show that 


(re — 1)* — (— 1)* 


(/„+(- If In 


where U„ is the re x re matrix all of whose entries are ones and 
/„ is the re x re identity matrix. 

(f) Show that for re > 2, all vertices for these directed graphs 
belong to cliques. 

T2. Consider a round-robin tournament among re players (labeled 

ai, a 2 , a 3 , . . . , a n ) where a i beats a 2 , a 2 beats a 3 , a 3 beats 134 

a n _ 1 beats a n , and a n beats a 1. Compute the “power” of each 
player, showing that they all have the same power; then determine 
that common power. [Hint: Use a computer to study the cases 
re = 3, 4, 5, 6; then make a conjecture and prove your conjecture 
to be true.] 


10.6 Games of Strategy 

In this section we discuss a general game in which two competing players choose separate 
strategies to reach opposing objectives. The optimal strategy of each player is found in 
certain cases with the use of matrix techniques. 

Matrix Multiplication 
Basic Probability Concepts 


GameTheory 




of player C 
A Figure 10.6.1 


To introduce the basic concepts in the theory of games, we will consider the following 
carnival-type game that two people agree to play. We will call the participants in the 
game player R and player C. Each player has a stationary wheel with a movable pointer 
on it as in Figure 10.6.1. For reasons that will become clear, we will call player R's wheel 
the row-wheel and player C’s wheel the column-wheel. The row-wheel is divided into 
three sectors numbered 1, 2, and 3, and the column-wheel is divided into four sectors 
numbered 1,2, 3, and 4. The fractions of the area occupied by the various sectors are 
indicated in the figure. To play the game, each player spins the pointer of his or her wheel 
and lets it come to rest at random. The number of the sector in which each pointer comes 
to rest is called the move of that player. Thus, player R has three possible moves and 
player C has four possible moves. Depending on the move each player makes, player C 
then makes a payment of money to player R according to Table 1 . 


Table Payment to Player R 



Player C’s Move 

1 

2 

3 

4 

Player R's 

Move 

1 

$3 

$5 

-$2 

-SI 

2 

-$2 

$4 

-S3 

-S4 

3 

$6 

-$5 

SO 

$3 


For example, if the row-wheel pointer comes to rest in sector 1 (player R makes 
move 1), and the column-wheel pointer comes to rest in sector 2 (player C makes move 2), 
then player C must pay player R the sum of $5. Some of the entries in this table are 
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Two-Person Zero-Sum 
Matrix Games 


negative, indicating that player C makes a negative payment to player R. By this we 
mean that player R makes a positive payment to player C. For example, if the row- wheel 
shows 2 and the column-wheel shows 4, then player R pays player C the sum of $4, 
because the corresponding entry in the table is — $4. In this way the positive entries of 
the table are the gains of player R and the losses of player C, and the negative entries 
are the gains of player C and the losses of player R. 

In this game the players have no control over their moves; each move is determined 
by chance. However, if each player can decide whether he or she wants to play, then each 
would want to know how much he or she can expect to win or lose over the long term 
if he or she chooses to play. (Later in the section we will discuss this question and also 
consider a more complicated situation in which the players can exercise some control 
over their moves by varying the sectors of their wheels.) 


The game described above is an example of a two-person zero-sum matrix game. The 
term zero-sum means that in each play of the game, the positive gain of one player is 
equal to the negative gain (loss) of the other player. That is, the sum of the two gains is 
zero. The term matrix game is used to describe a two-person game in which each player 
has only a finite number of moves, so that all possible outcomes of each play, and the 
corresponding gains of the players, can be displayed in tabular or matrix form, as in 
Table 1. 

In a general game of this type, let player R have m possible moves and let player C 
have n possible moves. In a play of the game, each player makes one of his or her possible 
moves, and then a payoff is made from player C to player R , depending on the moves. 
For / = 1,2, ... ,m, and j = 1, 2, . . . , n, let us set 

a,j = payoff that player C makes to player R if player R 
makes move i and player C makes move j 

This payoff need not be money; it can be any type of commodity to which we can attach 
a numerical value. As before, if an entry ajj is negative, we mean that player C receives 
a payoff of a,-/ 1 from player R. We arrange these ran possible payoffs in the form of an 
m x n matrix 


All 

a 12 

Cl\n 

(?21 

«22 

n 

1 

ttw2 

& mn 


which we will call the payoff matrix of the game. 

Each player is to make his or her moves on a probabilistic basis. For example, for 
the game discussed in the introduction, the ratio of the area of a sector to the area of 
the wheel would be the probability that the player makes the move corresponding to 
that sector. Thus, from Figure 10.6.1, we see that player R would make move 2 with 
probability | , and player C would make move 2 with probability | . In the general case 
we make the following definitions: 

Pi = probability that player R makes move i (i = 1, 2, . . . , m) 
qj = probability that player C makes move j (j = 1,2,..., n ) 

It follows from these definitions that 


and 


PI + P2 + -- 

“I” Pm — 1 

q\ + <72 + ■ 

** + #«= 1 
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With the probabilities /?,■ and qj we form two vectors: 


P = \p 1 Pi ■■■ Pm] and q = 


<7i 

<?2 


We call the row vector p the strategy of player R and the column vector q the strategy of 
player C . For example, from Figure 10.6.1 we have 


4 



3 

1 

6 


for the carnival game described earlier. 

From the theory of probability, if the probability that player R makes move i is p t , 
and independently the probability that player C makes move j is qj, then p^qj is the 
probability that for any one play of the game, player R makes move i and player C 
makes move j. The payoff to player R for such a pair of moves is a, 7 . If we multiply 
each possible payoff by its corresponding probability and sum over all possible payoffs, 
we obtain the expression 


(1) 


a u piqi + a\ 2 p\q 2 -\ a\ n p\q n +a 2 \p 2 q\ H h a mn p m q n 


Equation (1) is a weighted average of the payoffs to player R; each payoff is weighted 
according to the probability of its occurrence. In the theory of probability, this weighted 
average is called the expected payoff to player R. It can be shown that if the game is 
played many times, the long-term average payoff per play to player R is given by this 
expression. We denote this expected payoff by E( p, q) to emphasize the fact that it 
depends on the strategies of the two players. From the definition of the payoff matrix A 
and the strategies p and q, it can be verified that we may express the expected payoff in 
matrix notation as 


ain q\ 


an a 12 


t tin 6b 


<721 a 22 


E( p, q) = [pi Pi ■■■ Pm] 


= P^q (2) 


1 Qtnl 


Because E (p, q) is the expected payoff to player R , it follows that — E (p, q) is the expected 
payoff to player C. 

► EXAMPLE 1 Expected Payoff to Player R 

For the carnival game described earlier, we have 


3 5-2-1 i 


£(p,q) = pAq= [I \ \] -2 4 -3 -4 * = % = .1805... 


6-5033 


6 


Thus, in the long run, player R can expect to receive an average of about 18 cents from 
player C in each play of the game. -4 
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So far we have been discussing the situation in which each player has a predeter- 
mined strategy. We will now consider the more difficult situation in which both players 
can change their strategies independently. For example, in the game described in the 
introduction, we would allow both players to alter the areas of the sectors of their wheels 
and thereby control the probabilities of their respective moves. This qualitatively changes 
the nature of the problem and puts us firmly in the field of true game theory. It is under- 
stood that neither player knows what strategy the other will choose. It is also assumed 
that each player will make the best possible choice of strategy and that the other player 
knows this. Thus, player R attempts to choose a strategy p such that E (p, q) is as large as 
possible for the best strategy q that player C can choose; and similarly, player C attempts 
to choose a strategy q such that E (p, q) is as small as possible for the best strategy p that 
player R can choose. To see that such choices are actually possible, we will need the fol- 
lowing theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The 
general proof, which involves ideas from the theory of linear programming, will be omit- 
ted. However, below we will prove this theorem for what are called strictly determined 
games and 2x2 matrix games.) 


Fundamental Theorem of Zero-Sum Games 

There exist strategies p* and q* such that 

£(p*,q) > £(p*,q*) > £(p,q*) (3) 

for all strategies p and q. 


The strategies p* and q* in this theorem are the best possible strategies for players R 
and C, respectively. To see why this is so, let v — E( p*, q*). The left-hand inequality of 
Equation (3) then reads 

£(p*, q) > v for all strategies q 

This means that if player R chooses the strategy p*, then no matter what strategy q 
player C chooses, the expected payoff to player R will never be below v. Moreover, it 
is not possible for player R to achieve an expected payoff greater than v. To see why, 
suppose there is some strategy p** that player R can choose such that 

E( p**,q) > v for all strategies q 

Then, in particular, 

£(p**,q*) > v 

But this contradicts the right-hand inequality of Equation (3), which requires that 
v > E( p**, q*). Consequently, the best player R can do is prevent his or her expected 
payoff from falling below the value v. Similarly, the best player C can do is ensure 
that player R’s expected payoff does not exceed v, and this can be achieved by using 
strategy q*. 

On the basis of this discussion, we arrive at the following definitions. 


DEFINITION 1 If p* and q* are strategies such that 

£(p*.q) > £(p*,q*) > £(p.q*) (4) 

for all strategies p and q, then 

(i) p* is called an optimal strategy for player R. 

(ii) q* is called an optimal strategy for player C . 

(iii) v = E( p*, q*) is called the value of the game. 
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The wording in this definition suggests that optimal strategies are not necessarily unique. 
This is indeed the case, and in Exercise 2 we ask you to show this. However, it can be 
proved that any two sets of optimal strategies always result in the same value v of the 
game. That is, if p*, q* and p**, q** are optimal strategies, then 

£(p*,q*) = £(p**,q**) (5) 

The value of a game is thus the expected payoff to player R when both players choose 
any possible optimal strategies. 

To find optimal strategies, we must find vectors p* and q* that satisfy Equation (4). 
This is generally done by using linear programming techniques. Next, we discuss special 
cases for which optimal strategies may be found by more elementary techniques. 

We now introduce the following definition. 


DEFINITION 2 An entry a rs in a payoff matrix A is called a saddle point if 

(i) a rs is the smallest entry in its row, and 

(ii) a rs is the largest entry in its column. 

A game whose payoff matrix has a saddle point is called strictly determined . 


For example, the shaded element in each of the following payoff matrices is a saddle 
point: 


3 r 


30 -50 -5" 
60 90 75 

-4 0_ 


-10 60 -30 


0 

-3 

5 

-9 

15 

-8 

-2 

10 

7 

10 

6 

9 

6 

11 

-3 

2 


If a matrix has a saddle point a rs , it turns out that the following strategies are optimal 
strategies for the two players: 


O' 

0 


p* = [0 0 ■■■ 1 ■■■ 0], q* 

/ 

i*th entry 


i 

o 


5th entry 


That is, an optimal strategy for player R is to always make the rth move, and an optimal 
strategy for player C is to always make the ,vth move. Such strategies for which only one 
move is possible are called pure strategies. Strategies for which more than one move is 
possible are called mixed strategies. To show that the above pure strategies are optimal, 
you can verify the following three equations (see Exercise 6): 

£(p*,q*) = p*Aq* = a ri (6) 

E( p*, q) = p*Aq > a rs for any strategy q (7) 

£(pq*) = pAq* < a rs for any strategy p (8) 

Together, these three equations imply that 

£(p*.q) > £(P*,q*) > E (p, q*) 

for all strategies p and q. Because this is exactly Equation (4), it follows that p* and q* 
are optimal strategies. 
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From Equation (6) the value of a strictly determined game is simply the numerical 
value of a saddle point a rs . It is possible for a payoff matrix to have several saddle points, 
but then the uniqueness of the value of a game guarantees that the numerical values of 
all saddle points are the same. 

[ EXAMPLE 2 Optimal Strategies to Maximize a Viewing Audience 

Two competing television networks, R and C, are scheduling one-hour programs in the 
same time period. Network R can schedule one of three possible programs, and network 
C can schedule one of four possible programs. Neither network knows which program 
the other will schedule. Both networks ask the same outside polling agency to give them 
an estimate of how all possible pairings of the programs will divide the viewing audience. 
The agency gives them each Table 2, whose ( i , j )-th entry is the percentage of the viewing 
audience that will watch network R if network R’s program i is paired against network 
C’s program j . What program should each network schedule in order to maximize its 
viewing audience? 


Table 2 Audience Percentage for 
Network R 



Network C’s 
Program 

1 

2 

3 

4 

Network R's 

Program 

1 

60 

20 

30 

55 

2 

50 

75 

45 

60 

3 

70 

45 

35 

30 


Solution Subtract 50 from each entry in Table 2 to construct the following matrix: 


10 

-30 

-20 

5 

0 

25 

-5 

10 

20 

-5 

-15 

-20 


This is the payoff matrix of the two-person zero-sum game in which each network is 
considered to start with 50% of the audience, and the ( i , j )-th entry of the matrix is the 
percentage of the viewing audience that network C loses to network R if programs i and 
j are paired against each other. It is easy to see that the entry 

an = —5 

is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to 
schedule program 2, and the optimal strategy of network C is to schedule program 3. 
This will result in network R's receiving 45% of the audience and network C’s receiving 
55% of the audience. A 

2x2 Matrix Games Another case in which the optimal strategies can be found by elementary means occurs 
when each player has only two possible moves. In this case, the payoff matrix is a 2 x 2 
matrix 

Cl ii Cl 12 
Clu Cl22_ 


A = 
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If the game is strictly determined, at least one of the four entries of A is a saddle point, 
and the techniques discussed above can then be applied to determine optimal strategies 
for the two players. If the game is not strictly determined, we first compute the expected 
payoff for arbitrary strategies p and q: 



.021 022J \_qi_ 

= a nPi c h + a nPiqi + a-nPiqi + o- 22 p 2 qi 


(9) 


Because 


Pi + Pi = 1 and qi + q 2 = 1 


(10) 


we may substitute p 2 = \ — p\ and q 2 = 1 — q\ into (9) to obtain 

£(p, q) = a u p\q\ + Ai 2 pi(l - <?i) + a 2 i (1 - pi)q\ + a 22 (1 - Pi)(l - q\) (H) 

If we rearrange the terms in Equation ( 1 1), we can write 

£(p, q) = [(«n + a 2 2 — ci\ 2 — a 2 \)pi — (a 22 — a 2 \)]q\ + (a\ 2 — a 22 )p\ + a 22 (12) 

By examining the coefficient of the cp term in (12), we see that if we set 

a 22 - a 2 \ 

Pi = Pi = : (13) 

All + A 2 2 — Ai 2 — A21 

then that coefficient is zero, and ( 12) reduces to 


^ 11^22 — ^ 12^21 


£(p*,q) = 


(14) 


All + A22 — A12 — A21 


Equation (14) is independent of q; that is, if player R chooses the strategy determined 
by (13), player C cannot change the expected payoff by varying his or her strategy. 


In a similar manner, it can be verified that if player C chooses the strategy determined 
by 


A22 — A12 


qi = q*i = 


(15) 


All + A22 — A12 — A21 


then substituting in (12) gives 


^ 11^22 — ^ 12^21 


E{ p, q*) = 


(16) 


All + A22 ~ A12 — A21 


Equations (14) and (16) show that 


£(p*,q) = £(p*,q*) = £(p,q*) 


(17) 


for all strategies p and q. Thus, the strategies determined by (13), (15), and (10) are 
optimal strategies for players R and C, respectively, and so we have the following result. 
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Optimal Strategies for a 2 x 2 Matrix Game 

For a 2x2 game that is not strictly determined, optimal strategies for players R and C 
are 


P* 


an — « 2 i flu — a 12 

All + «22 — fll2 — fl21 All + fl22 — fll2 — fl21 


and 


an — a 12 

flll + fl22 ~ fll2 — fl21 
All ~ «21 

_fln + an — «i2 — «2i _ 


The value of the game is 


v = 


^ 11^22 — ^ 12^21 
<3 ii + 0-22 — Cl\2 — <^21 


In order to be complete, we must show that the entries in the vectors p* and q* are 
numbers strictly between 0 and 1 . In Exercise 8 we ask you to show that this is the case 
as long as the game is not strictly determined. 

Equation (17) is interesting in that it implies that either player can force the expected 
payoff to be the value of the game by choosing his or her optimal strategy, regardless of 
which strategy the other player chooses. This is not true, in general, for games in which 
either player has more than two moves. 


EXAMPLE 3 Using Theorem 10.6.2 

The federal government desires to inoculate its citizens against a certain flu virus. The 
virus has two strains, and the proportions in which the two strains occur in the virus 
population is not known. Two vaccines have been developed and each citizen is given 
only one of them. Vaccine 1 is 85% effective against strain 1 and 70% effective against 
strain 2. Vaccine 2 is 60% effective against strain 1 and 90% effective against strain 2. 
What inoculation policy should the government adopt? 


Solution We can consider this a two-person game in which player R (the government) 
desires to make the payoff (the fraction of citizens resistant to the virus) as large as 
possible, and player C (the virus) desires to make the payoff as small as possible. The 
payoff matrix is 

Strain 


Vaccine 


1 2 

.85 .70' 

.60 .90 


This matrix has no saddle points, so Theorem 10.6.2 is applicable. Consequently, 
* fl22 — fl2i -90 — .60 .30 2 

= ~ ~ 3 


<h 


v = 


Cl\\ <322 — ^3 12 — #21 

.85 + .90 - 

.70- 

.60 

.45 

* 2 1 
- 1 - /h - 1 “ 3 - 3 





an — a \2 

.90- 

.70 


.20 

Cl ii + <322 — #12 — #21 

” .85 + .90 - 

.70- 

.60 “ 

.45 ~ 

9 9 





#11#22 — #12#21 

(-85) (.90) - 

(-70X.60) 

.345 


flu + an — fli2 — fl2i .85 + .90 — .70 — .60 


.45 


= .7666 . . . 
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Thus, the optimal strategy for the government is to inoculate | of the citizens with vaccine 
1 and | of the citizens with vaccine 2. This will guarantee that about 76.7% of the citizens 
will be resistant to a virus attack regardless of the distribution of the two strains. 

In contrast, a virus distribution of | of strain 1 and | of strain 2 will result in the 
same 76.7% of resistant citizens, regardless of the inoculation strategy adopted by the 
government (see Exercise 7). 


Exercise Set 10.6 


1. Suppose that a game has a payoff matrix 


A = 


-4 

5 


-8 


6 

-7 

0 


-4 

3 

6 


1 

8 

-2 


(a) If players R and C use strategies 


P=[| 0 i] and 


q = 


i 

4 

1 

4 

1 

4 

1 

4 


respectively, what is the expected payoff of the game? 


(b) If player C keeps his strategy fixed as in part (a), what 
strategy should player R choose to maximize his expected 
payoff? 


(c) If player R keeps her strategy fixed as in part ( a), what strat- 
egy should player C choose to minimize the expected payoff 
to player R2 


2. Construct a simple example to show that optimal strategies are 
not necessarily unique. For example, find a payoff matrix with 
several equal saddle points. 


3. For the strictly determined games with the following payoff 
matrices, find optimal strategies for the two players, and find 
the values of the games. 


(a) 

'5 2" 

7 3 

(b) 

'-3 —2 

2 4 




-4 1 



4. For the 2x2 games with the following payoff matrices, find 
optimal strategies for the two players, and find the values of the 
games. 



' 6 3' 


' 40 

20' 


' 3 7" 

(a) 

— ! 4 

(b) 

— 10 

30 

(c) 

-5 4_ 


(d) 


'3 

5 


5' 

2 


(e) 


' 7 

-5 


-3' 

-2 


5. Player R has two playing cards: a black ace and a red four. 
Player C also has two cards: a black two and a red three. Each 
player secretly selects one of his or her cards. If both selected 
cards are the same color, player C pays player R the sum of the 
face values in dollars. If the cards are different colors, player 
R pays player C the sum of the face values. What are optimal 
strategies for both players, and what is the value of the game? 

6. Verify Equations (6), (7), and (8). 

7. Verify the statement in the last paragraph of Example 3. 

8. Show that the entries of the optimal strategies p* and q* given 
in Theorem 10.6.2 are numbers strictly between zero and one. 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. Consider a game between two players where each player can 
make up to n different moves (n > 1). If the ith move of player 
R and the j th move of player C are such that i + j is even, then 
C pays I? $1. If i + j is odd, then R pays C $1. Assume that 
both players have the same strategy — that is, p„ = [pi] i xn and 
q„ = [PiLxt. where pi + p 2 + Pi H h p„ = 1. Use a com- 

puter to show that 


£(p 2 ,q 2 ) = (Pi - P 2) 2 
£(p 3 ,q 3 ) = (Pi - Pz + P 3 ) 2 
E( p 4 , q 4 ) = (Pi - p 2 + P 3 - P4) 2 
E{ p 5 , q 5 ) = (pi - P 2 + P 3 - P4 + P 5) 2 
Using these results as a guide, prove in general that the expected 
payoff to player R is 


£(p„>q„) 


E(-u 2+1 p^ 


which shows that in the long run, player R will not lose in this 
game. 


T2. Consider a game between two players where each player can 
make up to n different moves (n > 1). If both players make the 
same move, then player C pays player R $(n — 1). However, if 
both players make different moves, then player R pays player C 
$1. Assume that both players have the same strategy — that is, 

P„ = [pi ] l x« and q„ = [p,]„ x i, where pi + p 2 + Pi H h p„ = 

1. Use a computer to show that 

£(P 2 > <h) = \{P\~ Pi ) 2 + s(Pi - P 2) 2 + \(f>i ~ Pi ) 2 
+ \ {Pi ~ P 2) 2 

£(p 3 > 03 ) = \ (Pi - Pi) 2 + | (Pi - P 2) 2 + |(Pi - P 3) 2 
+ \lPl~ Pi) 2 + \{Pl ~ P2) 2 + |(P2 — PiY 

+ \ (P3 — Pi) 2 + |(P3 — P2) 2 + |(P3 — PlY 

E( p 4 , q 4 ) = |(pi - Pi ) 2 + |(Pi — P 2) 2 + s(Pi - P 3) 2 

+ |(Pl - P4 ) 2 + |(P 2 - Pi ) 2 + s(P 2 - P 2) 2 

+ \ (Pl~ P3 ) 2 + ^(P 2 — P4 ) 2 + 5 (P3 — Pi ) 2 

+ \ (P3 - P 2) 2 + |(P3 - P3 ) 2 + |(P3 - P4 ) 2 

+ |(P4 — Pi ) 2 + |(P4 — P 2) 2 + |(P4 — P3 ) 2 

+ i(p 4 - P 4 ) 2 
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Using these results as a guide, prove in general that the expected 
payoff to player R is 

1 n n 

£(p n . q„) = 5 “ A)2 - 0 

,=1 j=i 

which shows that in the long run, player R will not lose in this 
game. 


10.7 Leontief Economic Models 

In this section we discuss two linear models for economic systems. Some results about 
nonnegative matrices are applied to determine equilibrium price structures and outputs 
necessary to satisfy demand. 

Linear Systems 
Matrices 

Economic Systems Matrix theory has been very successful in describing the interrelations among prices, 
outputs, and demands in economic systems. In this section we discuss some simple 
models based on the ideas of Nobel laureate Wassily Leontief. We examine two different 
but related models: the closed or input-output model, and the open or production 
model. In each, we are given certain economic parameters that describe the interrelations 
between the ‘'industries” in the economy under consideration. Using matrix theory, we 
then evaluate certain other parameters, such as prices or output levels, in order to satisfy 
a desired economic objective. We begin with the closed model. 


Leontief Closed First we present a simple example; then we proceed to the general theory of the model. 

(Input-Output) Model 


^ EXAMPLE 1 An Input-Output Model 

Three homeowners — a carpenter, an electrician, and a plumber — agree to make repairs 
in their three homes. They agree to work a total of 1 0 days each according to the following 
schedule: 
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Work Performed by 

Carpenter 

Electrician 

Plumber 

Days of Work in Home of Carpenter 

2 

1 

6 

Days of Work in Home of Electrician 

4 

5 

1 

Days of Work in Home of Plumber 

4 

4 

3 


For tax purposes, they must report and pay each other a reasonable daily wage, even 
for the work each does on his or her own home. Their normal daily wages are about 
$100, but they agree to adjust their respective daily wages so that each homeowner will 
come out even — that is, so that the total amount paid out by each is the same as the total 
amount each receives. We can set 


p\ = daily wage of carpenter 
P 2 — daily wage of electrician 
pi = daily wage of plumber 


To satisfy the “equilibrium” condition that each homeowner comes out even, we require 
that 

total expenditures = total income 

for each of the homeowners for the 10-day period. For example, the carpenter pays a 
total of 2pi + p 2 + 6 pi for the repairs in his own home and receives a total income of 
10pi for the repairs that he performs on all three homes. Equating these two expressions 
then gives the first of the following three equations: 

2p\ + pi + 6pi = 10/?i 

4/3! + 5/3 2 + Pi = 10/3 2 
4/3 1 + 4/3 2 + 3/3 3 = I0pi 


The remaining two equations are the equilibrium equations for the electrician and the 
plumber. Dividing these equations by 10 and rewriting them in matrix form yields 


.2 .1 .6" 


Pi 


Pi 

.4 .5 .1 


Pi 

= 

P2 

1 

4 ^ 

4 ^ 


_Pl_ 


_Pi_ 


Equation (1) can be rewritten as a homogeneous system by subtracting the left side from 
the right side to obtain 


1 

oo 

\ 

\ 

ON | 


pi 


"o" 

-.4 .5 -.1 


P2 

= 

0 

1 

1 

4^ 

1 

4^ 

| b'j 


_P3_ 


0 


The solution of this homogeneous system is found to be (verify) 


Pi 


~31~ 

P2 

= S 

32 

_P3_ 


36 


where s is an arbitrary constant. This constant is a scale factor, which the homeowners 
may choose for their convenience. For example, they can set s — 3 so that the corre- 
sponding daily wages — $93, $96, and $108 — are about $100. 
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This example illustrates the salient features of the Leontief input-output model of a 
closed economy. In the basic Equation ( 1 ), each column sum of the coefficient matrix is 1 , 
corresponding to the fact that each of the homeowners’ “output” of labor is completely 
distributed among these same homeowners in the proportions given by the entries in the 
column. Our problem is to determine suitable “prices” for these outputs so as to put the 
system in equilibrium — that is, so that each homeowner’s total expenditures equal his or 
her total income. 

In the general model we have an economic system consisting of a finite number of 
“industries,” which we number as industries 1,2 , ,k. Over some fixed period of time, 
each industry produces an “output” of some good or service that is completely utilized 
in a predetermined manner by the k industries. An important problem is to find suitable 
“prices” to be charged for these k outputs so that for each industry, total expenditures 
equal total income. Such a price structure represents an equilibrium position for the 
economy. 

For the fixed time period in question, let us set 

Pi = price charged by the /th industry for its total output 
e t j = fraction of the total output of the y'th industry purchased by the 
i th industry 


for i, j = 1,2 , ,k. By definition, we have 


(i) Pi > 0, i = 1, 2, . . . , k 

(ii) etj > 0, i, 7 = 1,2 k 

(iii) e\j + e2j + ••• + % = 1 , j — 1 , 2, . . . , k 


With these quantities, we form the price vector 


P\ 

P2 

Pk_ 


and the exchange matrix or input-output matrix 



en 

e\i 

■ ■ e\k 

ei\ 

e 2 2 

■ ■ eik 

ek\ 

eki ■ ■ 

■ ■ ekk 


Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1 . 

As in the example, in order that the expenditures of each industry be equal to its 
income, the following matrix equation must be satisfied [see (1)]: 


£p = p 


( 2 ) 


or 


(/ - E)p = 0 (3) 

Equation (3) is a homogeneous linear system for the price vector p. It will have a 
nontrivial solution if and only if the determinant of its coefficient matrix I — E is zero. 
In Exercise 7 we ask you to show that this is the case for any exchange matrix E. Thus, 
(3) always has nontrivial solutions for the price vector p. 

Actually, for our economic model to make sense, we need more than just the fact 
that (3) has nontrivial solutions for p. We also need the prices p t of the k outputs to 
be nonnegative numbers. We express this condition as p > 0. (In general, if A is any 
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vector or matrix, the notation A > 0 means that every entry of A is nonnegative, and 
the notation A > 0 means that every entry of A is positive. Similarly, A > B means 
A — B > 0, and A > B means A — B > 0.) To show that (3) has a nontrivial solution 
for which p > 0 is a bit more difficult than showing merely that some nontrivial solution 
exists. But it is true, and we state this fact without proof in the following theorem. 


!EM 10.7.1 If E is an exchange matrix, then E p = p always has a nontrivial 
solution p whose entries are nonnegative. 


Let us consider a few simple examples of this theorem. 


t EXAMPLE 2 UsingTheorem 10.7.1 


Let 


Then (I — £)p = 0 is 


which has the general solution 


' 2 

>r 


'o' 

-2 0. 

_P2_ 


_ 0 _ 


P = 5 


O' 

1 


where s is an arbitrary constant. We then have nontrivial solutions p > 0 for any s > 0. 


► EXAMPLE 3 UsingTheorem 10.7.1 

Let 

p 

E = 

_C 

Then (I — E) p = 0 has the general solution 


P = 




T 

0 


O' 

1 


where s and t are independent arbitrary constants. Nontrivial solutions p > 0 then result 
from any ,y > 0 and t > 0, not both zero. 


Example 2 indicates that in some situations one of the prices must be zero in order 
to satisfy the equilibrium condition. Example 3 indicates that there may be several lin- 
early independent price structures available. Neither of these situations describes a truly 
interdependent economic structure. The following theorem gives sufficient conditions 
for both cases to be excluded. 


Let E be an exchange matrix such that for some positive integer m 
all the entries of E' n are positive. Then there is exactly one linearly independent solution 
of (/ — £)p = 0, and it may be chosen so that all its entries are positive. 
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Leontief Open (Production) 
Model 


We will not give a proof of this theorem. If you have read Section 10.4 on Markov 
chains, observe that this theorem is essentially the same as Theorem 10.4.4. What we 
are calling exchange matrices in this section were called stochastic or Markov matrices 
in Section 10.4. 


► EXAMPLE 4 Using Theorem 10.7.2 

The exchange matrix in Example 1 was 


E = 


.2 

A 

.4 


.1 .6 

.5 .1 

.4 .3 


Because E > 0, the condition E m > 0 in Theorem 10.7.2 is satisfied for m = 1. Con- 
sequently, we are guaranteed that there is exactly one linearly independent solution of 
(/ — E) p = 0, and it can be chosen so that p > 0. In that example, we found that 


P = 


31 

32 
36 


is such a solution. M 


In contrast with the closed model, in which the outputs of k industries are distributed only 
among themselves, the open model attempts to satisfy an outside demand for the outputs. 
Portions of these outputs can still be distributed among the industries themselves, to keep 
them operating, but there is to be some excess, some net production, with which to satisfy 
the outside demand. In the closed model the outputs of the industries are fixed, and 
our objective is to determine prices for these outputs so that the equilibrium condition, 
that expenditures equal incomes, is satisfied. In the open model it is the prices that are 
fixed, and our objective is to determine levels of the outputs of the industries needed to 
satisfy the outside demand. We will measure the levels of the outputs in terms of their 
economic values using the fixed prices. To be precise, over some fixed period of time, let 

X( = monetary value of the total output of the zth industry 
di — monetary value of the output of the ;'th industry needed to satisfy 
the outside demand 

Cjj = monetary value of the output of the ;'th industry needed by the yth 
industry to produce one unit of monetary value of its own output 


With these quantities, we define the production vector 

Xl 

X2 

x = 

Xk 


d x 


d = 


di 


dk 


the demand vector 
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and the consumption matrix 

Cll C \2 ■■■ Ci* 

£ _ C21 C22 ■ ■ ■ C2k 

_Ck\ C*2 ‘ ' ' C kk 

By their nature, we have that 

x > 0, d > 0, and C > 0 

From the definition of c,y and Xj, it can be seen that the quantity 

cnx\ + c i2 x 2 H b c ik x k 

is the value of the output of the i th industry needed by all k industries to produce a total 
output specified by the production vector x. Because this quantity is simply the / th entry 
of the column vector Cx, we can say further that the / th entry of the column vector 

x — Cx 

is the value of the excess output of the i th industry available to satisfy the outside demand . 
The value of the outside demand for the output of the / th industry is the i th entry of the 
demand vector d. Consequently, we are led to the following equation 

x — Cx = d 


or 

(/ - C)x = d (4) 

for the demand to be exactly met, without any surpluses or shortages. Thus, given C 
and d, our objective is to find a production vector x > 0 that satisfies Equation (4). 


C EXAMPLES Production Vector for a Town 

A town has three main industries: a coal-mining operation, an electric power-generating 
plant, and a local railroad. To mine $1 of coal, the mining operation must purchase 
$.25 of electricity to run its equipment and $.25 of transportation for its shipping needs. 
To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of 
its own electricity to run auxiliary equipment, and $.05 of transportation. To provide 
$1 of transportation, the railroad requires $.55 of coal for fuel and $.10 of electricity 
for its auxiliary equipment. In a certain week the coal-mining operation receives orders 
for $50,000 of coal from outside the town, and the generating plant receives orders for 
$25,000 of electricity from outside. There is no outside demand for the local railroad. 
How much must each of the three industries produce in that week to exactly satisfy their 
own demand and the outside demand? 


Solution For the one-week period let 

Xi = value of total output of coal-mining operation 
x 2 = value of total output of power-generating plant 
X 3 = value of total output of local railroad 


From the information supplied, the consumption matrix of the system is 


C = 


0 

.25 

.25 


.65 

.05 

.05 


.55 

.10 

0 
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The linear system (I — C)x = d is then 


" 1.00 -.65 -.55" 


x\ 


"50,000" 

— .25 .95 -.10 


*2 

= 

25,000 

-.25 -.05 1.00 




0 


The coefficient matrix on the left is invertible, and the solution is given by 


"756 542 470" 


"50,000" 


"102,087" 

220 690 190 


25,000 

= 

56,163 

200 170 630 


0 


28,330 


Thus, the total output of the coal-mining operation should be $102,087, the total output 
of the power-generating plant should be $56,163, and the total output of the railroad 
should be $28,330. 


Let us reconsider Equation (4): 

(I — C)x = d 

If the square matrix I — C is invertible, we can write 

x= (/- Cr'd (5) 

In addition, if the matrix (/ — C) _1 has only nonnegative entries, then we are guaranteed 
that for any d > 0, Equation (5) has a unique nonnegative solution for x. This is a 
particularly desirable situation, as it means that any outside demand can be met. The 
terminology used to describe this case is given in the following definition. 


DEFINITION 1 A consumption matrix C is said to be productive if (/ — C) 1 exists 
and 

(/ - C )- 1 > 0 


We will now consider some simple criteria that guarantee that a consumption matrix 
is productive. The first is given in the following theorem. 


Productive Consumption Matrix 

A consumption matrix C is productive if and only if there is some production vector 
x > 0 such that x > Cx. 


(The proof is outlined in Exercise 9.) The condition x > Cx means that there is some 
production schedule possible such that each industry produces more than it consumes. 

Theorem 10.7.3 has two interesting corollaries. Suppose that all the row sums of C 
are less than 1 . If 

T 

1 


_ 1 _ 

then Cx is a column vector whose entries are these row sums. Therefore, x > Cx, and 
the condition of Theorem 10.7.3 is satisfied. Thus, we arrive at the following corollary: 


10.7 A consumption matrix is productive if each of its row sums is less 

than 1. 



586 Chapter 10 Applications of Linear Algebra 

As we ask you to show in Exercise 8, this corollary leads to the following: 


A consumption matrix is productive if each of its column sums is 

less than 1. 


Recalling the definition of the entries of the consumption matrix C, we see that the yth 
column sum of C is the total value of the outputs of all k industries needed to produce 
one unit of value of output of the y'th industry. The yth industry is thus said to be 
profitable if that / th column sum is less than 1. In other words. Corollary 10.7.5 says 
that a consumption matrix is productive if all k industries in the economic system are 
profitable. 


► EXAMPLE 6 Using Corollary 10.7.5 

The consumption matrix in Example 5 was 


0 

.65 

,25 

.05 

,25 

.05 


.55 

.10 

0 


All three column sums in this matrix are less than 1 , so all three industries are profitable. 
Consequently, by Corollary 10.7.5, the consumption matrix C is productive. This can 
also be seen in the calculations in Example 5, as (/ — C) -1 is nonnegative. 


Exercise Set 10.7 


1. For the following exchange matrices, find nonnegative price 
vectors that satisfy the equilibrium condition (3). 


(a) 

"1 1" 

2 3 

1 2 



.2 3 _ 



".35 

50 .30" 

(c) 

.25 .20 .30 


_.40 

30 .40_ 


4. Three neighbors have backyard vegetable gardens. Neighbor 
A grows tomatoes, neighbor B grows corn, and neighbor C 
grows lettuce. They agree to divide their crops among them- 
selves as follows: A gets 1 of the tomatoes, | of the corn, and 
1 of the lettuce. B gets | of the tomatoes, | of the corn, and 
1 of the lettuce. C gets 1 of the tomatoes, | of the corn, | of 
the lettuce. What prices should the neighbors assign to their 
respective crops if the equilibrium condition of a closed econ- 
omy is to be satisfied, and if the lowest-priced crop is to have a 
price of $100? 


2. Using Theorem 10.7.3 and its corollaries, show that each of the 
following consumption matrices is productive. 



'.8 .f 


'.70 

.30 

.25 

(a) 

(b) 

.20 

.40 

.25 


• J • vl 


.05 

.15 

.25 


(c) 


'.7 

.1 

.2 


.3 

.4 

.4 


. 2 " 

.3 

.1 


3. Using Theorem 10.7.2, show that there is only one linearly in- 
dependent price vector for the closed economic system with 
exchange matrix 


E = 


‘0 

1 

0 


.2 

.2 

.6 


.5" 

.5 

0 


5. Three engineers — a civil engineer (CE), an electrical engineer 
(EE), and a mechanical engineer (ME) — each have a consulting 
firm. The consulting they do is of a multidisciplinary nature, 
so they buy a portion of each others’ services. For each $1 
of consulting the CE does, she buys $.10 of the EE’s services 
and $.30 of the ME’s services. For each $1 of consulting the 
EE does, she buys $.20 of the CE’s services and $.40 of the 
ME’s services. And for each $1 of consulting the ME does, she 
buys $.30 of the CE’s services and $.40 of the EE’s services. 
In a certain week the CE receives outside consulting orders of 
$500, the EE receives outside consulting orders of $700, and 
the ME receives outside consulting orders of $600. What dollar 
amount of consulting does each engineer perform in that week? 

6. (a) Suppose that the demand d t for the output of the i th indus- 

try increases by one unit. Explain why the /th column of 
the matrix (/ — C) -1 is the increase that must be made to 
the production vector x to satisfy this additional demand. 
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(b) Referring to Example 5, use the result in part (a) to de- 
termine the increase in the value of the output of the 
coal-mining operation needed to satisfy a demand of one 
additional unit in the value of the output of the power- 
generating plant. 

7. Using the fact that the column sums of an exchange matrix £ are 
all 1, show that the column sums of I — E are zero. From this, 
show that I — E has zero determinant, and so (I — £)p = 0 
has nontrivial solutions for p. 

8. Show that Corollary 10.7.5 follows from Corollary 10.7.4. 
[Hint: Use the fact that (A r ) _1 = (A~') r for any invertible 
matrix A.] 

9. ( Calculus required) Prove Theorem 10.7.3 as follows: 

(a) Prove the “only if” part of the theorem; that is, show that 
if C is a productive consumption matrix, then there is a 
vector x > 0 such that x > Cx. 

(b) Prove the “if ” part of the theorem as follows: 

Step 1. Show that if there is a vector x* > 0 such that 
Cx* < x*. then x* > 0. 

Step 2. Show that there is a number X such that 0 < X < 1 
and Cx* < Xx*. 

Step 3. Show that C"x* < X n x* for n = 1, 2, . . . . 

Step 4. Show that C" — > 0 as /; — > oo. 

Step 5. By multiplying out, show that 

(/ - C)(I + C + C 2 + • • • + C"- 1 ) = I — C" 
for n = 1 , 2, . . . . 

Step 6. By letting n — > oo in Step 5, show that the matrix 
infinite sum 

S = / + C + C 2 + • • • 


in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. Consider a sequence of exchange matrices 
[E 2 , £3, £4, £5, .... £„), where 


£2 = 

‘0 

1 

1 " 

2 

1 



£3 = 

"0 

1 

1 

2 

0 

1“ 

3 

1 

3 



2 _ 




0 

1 

1 








2 

3 _ 







0 

1 

1 


0 

1 

1 

1 " 


2 

3 


2 

3 

4 


1 

0 

1 

II 

1 

0 

1 

3 

1 

4 

, £5 = 

0 

1 

3 

0 


0 

1 

2 

0 

1 

4 


0 

0 

1 


0 

0 

1 

1 




3 



3 

4 _ 


0 

0 

0 


and so on. Use a computer to show that £? > 0 2 , £| > 0 3 , £ * > 
0 4 , £j > 0 5 , and make the conjecture that although £„" > 0„ is 
true, £* > 0„ is not true for k = 1, 2, 3, . . . , n — 1. Next, use a 
computer to determine the vectors p„ such that £„p„ = p (1 (for 
n = 2, 3, 4, 5, 6), and then see if you can discover a pattern that 
would allow you to compute p„ +1 easily from p„ . Test your dis- 
covery by first constructing p 8 from 


P 7 = 


2520 

3360 

1890 

672 

175 

36 

7 


and then checking to see whether £ 8 p 8 = p 8 . 


exists and that (/ — C)S = I . 

Step 7. Show that 5 > 0 and that S = (I — C) -1 . 

Step 8. Show that C is a productive consumption matrix. 

Working withTechnology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 


T2. Consider an open production model having n industries with 
n > 1. In order to produce $1 of its own output, the j th industry 
must spend $(l/n) for the output of the; th industry (for all; j), 
but the jth industry (for all j — 1, 2, 3, ... , n) spends nothing 
for its own output. Construct the consumption matrix C„, show 
that it is productive, and determine an expression for (/„ — C„ ) _1 . 
In determining an expression for (/„ — C„) _1 , use a computer to 
study the cases when n = 2, 3, 4, and 5; then make a conjecture 
and prove your conjecture to be true. [Hint: If £„ = [l]„ x „ (i.e., 
the n x 11 matrix with every entry equal to 1), first show that 

= nF„ 

and then express your value of (/„ — C„) _1 in terms of n, 
and £„ .] 
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10.8 Forest Management 

In this section we discuss a matrix model for the management of a forest where trees are 
grouped into classes according to height. The optimal sustainable yield of a periodic harvest 
is calculated when the trees of different height classes can have different economic values. 

Matrix Operations 


Optimal Sustainable Yield Our objective is to introduce a simplified model for the sustainable harvesting of a forest 

whose trees are classified by height. The height of a tree is assumed to determine its 
economic value when it is cut down and sold. Initially, there is a distribution of trees 
of various heights. The forest is then allowed to grow for a certain period of time, after 
which some of the trees of various heights are harvested. The trees left unharvested 
are to be of the same height configuration as the original forest, so that the harvest is 
sustainable. As we will see, there are many such sustainable harvesting procedures. We 
want to find one for which the total economic value of all the trees removed is as large 
as possible. This determines the optimal sustainable yield of the forest and is the largest 
yield that can be attained continually without depleting the forest. 


The Model Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas 
trees year after year. Every December the harvester cuts down some of the trees to be 
sold. For each tree cut down, a seedling is planted in its place. In this way the total 
number of trees in the forest is always the same. (In this simplified model, we will not 
take into account trees that die between harvests. We assume that every seedling planted 
survives and grows until it is harvested.) 

In the marketplace, trees of different heights have different economic values. Suppose 
that there are n different price classes corresponding to certain height intervals, as shown 
in Table 1 and Figure 10.8.1. The first class consists of seedlings with heights in the 
interval (0, h\), and these seedlings are of no economic value. The nth class consists of 
trees with heights greater than or equal to h n _ l . 


Table 1 


Class 

Value (dollars) 

Height Interval 

1 (seedlings) 

None 

[0, h x ) 

2 

Pi 

[/q, h 2 ) 

3 

Pi 

Uh, h 3 ) 




n— 1 

Pn- 1 

V’n - 2’ K - l) 

n 

Pn 




▲ Figure 10.8.1 
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Let x, (i = 1 , 2, be the number of trees within the i th class that remain after 
each harvest. We form a column vector with the numbers and call it the nonharvest 
vector: 

Xl 


x 2 


x n_ 

For a sustainable harvesting policy, the forest is to be returned after each harvest to the 
fixed configuration given by the nonharvest vector x. Part of our problem is to find those 
nonharvest vectors x for which sustainable harvesting is possible. 

Because the total number of trees in the forest is fixed, we can set 


x \ + x 2 + ' ' ' + x n — s ( 1 ) 

where s is predetermined by the amount of land available and the amount of space each 
tree requires. Referring to Figure 10.8.2, we have the following situation. The forest 
configuration is given by the vector x after each harvest. Between harvests the trees 
grow and produce a new forest configuration before each harvest. A certain number of 
trees are removed from each class at the harvest. Finally, a seedling is planted in place 
of each tree removed, to return the forest again to the configuration x. 



Forest before growth Forest after harvest 

(nonharvest vector x) (nonharvest vector x) 


▲ Figure 10.8.2 


Consider first the growth of the forest between harvests. During this period a tree 
in the z th class may grow and move up to a higher height class. Or its growth may be 
retarded for some reason, and it will remain in the same class. We consequently define 
the following growth parameters g, for i — 1,2 — 1 : 

gi — the fraction of trees in the i th class that grow into 
the (i + l)-st class during a growth period 

For simplicity we assume that a tree can move at most one height class upward in one 
growth period. With this assumption, we have 

1 — gi = the fraction of trees in the z th class that remain in 
the z'th class during a growth period 
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With these n — 1 growth parameters, we form the following n x n growth matrix : 


G = 


1 - g\ 

gi 

0 

0 

0 


0 

1 - g2 
g2 

0 

0 


0 

0 

1 - gl 

0 

0 


0 

0 

0 

1 - g n - 1 0 

gn - 1 1 


( 2 ) 


Because the entries of the vector x are the numbers of trees in the n classes before the 
growth period, you can verify that the entries of the vector 

11 - gi)*i 
gi*i + (1 - g2)X2 
gix 2 + (1 - gi)X 3 


Gx = 


(3) 


gn- 2 X „-2 + (1 — gn-l)x n -] 
gn — lXn — l T X n 

are the numbers of trees in the n classes after the growth period. 

Suppose that during the harvest we remove y,- (i = 1.2 trees from the / th 
class. We will call the column vector 

v 2 


the harvest vector. Thus, a total of 

yi + yi + • • • + y« 

trees are removed at each harvest. This is also the total number of trees added to the first 
class (the new seedlings) after each harvest. If we define the following n x n replacement 
matrix 

1 1 

0 0 


R 


then the column vector 


Ry = 


0 0 •• 
>i +y2 + ■ 

o 

o 

0 


(4) 


(5) 


specifies the configuration of trees planted after each harvest. 

At this point we are ready to write the following equation, which characterizes a 
sustainable harvesting policy: 


configuration 


new seedling 
replacement 


configuration 

at end of 

— [harvest] + 

= 

at beginning of 

growth period 



growth period 


or mathematically, 


Gx - y + Ry = x 
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Optimal Sustainable Yield 


This equation can be rewritten as 

(/ - R ) y = (G - /)x (6) 

or more comprehensively as 


"o 

-1 

-1 • • 

■ -1 

-l" 


' yi ' 

0 

1 

0 •• 

0 

0 


yi 

0 

0 

1 • • 

0 

0 


3b 

0 

0 

0 •• 

1 

0 


y„~i 

0 

0 

0 •• 

0 

1 


- y n _ 


-g\ 

0 

0 • 

0 

o' 


" Xi 

g\ 

-g2 

0 • 

0 

0 


X 2 

0 

g2 

-g3 ■ 

0 

0 


X3 

0 

0 

0 • 

gn—1 

0 


Xn— 1 

0 

0 

0 •• 

gn—1 

0 


_ x n _ 


We will refer to Equation (6) as the sustainable harvesting condition. Any vectors x and y 

with nonnegative entries, and such that x\ + x 2 + h x n = s , which satisfy this matrix 

equation, determine a sustainable harvesting policy for the forest. Note that if yi > 0, 
then the harvester is removing seedlings of no economic value and replacing them with 
new seedlings. Because there is no point in doing this, we assume that 

yi = 0 (7) 

With this assumption, it can be verified that (6) is the matrix form of the following set 
of equations: 

3b + 3b + • • • + 3W — 

yi = 

3b = 

y n - 1 = 

y„ = 

Note that the first equation in (8) is the sum of the remaining n — 1 equations. 

Because we must have >',■ > 0 for = 2, 3 Equations (8) require that 

g\X\ > g2X 2 > ■ > gn-lXn-l > 0 (9) 

Conversely, if x is a column vector with nonnegative entries that satisfy Equation (9), 
then (7) and (8) define a column vector y with nonnegative entries. Furthermore, x and 
y then satisfy the sustainable harvesting condition (6). In other words, a necessary and 
sufficient condition for a nonnegative column vector x to determine a forest configuration 
that is capable of sustainable harvesting is that its entries satisfy (9). 

Because we remove trees from the ith class (i — 2,3, , n ) and each tree in the ith 
class has an economic value of pt, the total yield of the harvest, Y Id, is given by 

Yld = p 2 y 2 + p 3 y 3 -\ 1 - p n y n (10) 

Using (8), we may substitute for the y,’s in (10) to obtain 


gi*i 

gi*i - g 2 x 2 

g 2 X 2 - g 3 *3 


(B) 


gn— 2 Xn—2 gn—lX/i—1 
gn—lXn—1 


Yld = p 2 g\X\ + (p 3 - p 2 )g 2 x 2 H h (p n - Pn-l)gn-lX„-\ 


(ID 
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Combining ( 1 1), (1), and (9), we can now state the problem of maximizing the yield 
of the forest over all possible sustainable harvesting policies as follows: 


Problem Find nonnegative numbers X\,x 2 , ■ ■ ■ , x„ that maximize 


Yld = p 2 g\X\ + (P3 - Pl)g2X 2 H h (p n - p„-\)gn-\X n -\ 


subject to 
and 


X 1 + x 2 + • • • + x n — s 


g 1*1 > g 2 X 2 >■■•> gn-lX„-l > 0 


As formulated above, this problem belongs to the field of linear programming. However, 
we will illustrate the following result, without linear programming theory, by actually 
exhibiting a sustainable harvesting policy. 


Optimal Sustainable Yield 

The optimal sustainable yield is achieved by harvesting all the trees from one particular 
height class and none of the trees from any other height class. 


Let us first set 

Yldk = yield obtained by harvesting all of the A:th 
class and none of the other classes 

The largest value of Yldk for k = 2, 3, . . . , n will then be the optimal sustainable yield, 
and the corresponding value of k will be the class that should be completely harvested to 
attain the optimal sustainable yield. Because no class but the Lth is harvested, we have 

y 2 = y'3 = ■ ■ ■ = yk - 1 = yk+i = ■■■ = y n = 0 ( 12 ) 

In addition, because all of the Arth class is harvested, no trees are ever present in the 
height classes above the &th class. Thus, 

x k = Xjfc+i = • • • = x„ = 0 (13) 

Substituting (12) and (13) into the sustainable harvesting condition (8) gives 


yk = g\X\ 

0 = g\X\ - g 2 x 2 
0 = g 2 x 2 - g3X 3 

o = gk-2Xk-2 — gk-\Xk-l 
Yk — gk-lXk-1 

Equations (14) can also be written as 

yk = g\X\ = g 2 x 2 = ■■■ = g k -\Xk-\ 
from which it follows that 

X 2 = glXl/g 2 
X3 = g\Xl/g3 


(14) 


(15) 


(16) 


Xk - 1 = glXi/gk-l 
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If we substitute Equations (13) and (16) into 


X\ + X 2 4 I -X n = S 


[which is Equation (1)], we can solve for x\ and obtain 


Xl 


, + «! + £! + . ..+ 


gl gl 


g 1 

gk - 1 


For the yield Yld k , we combine (10), (12), (15), and (17) to obtain 


(17) 


Yld k — p 2 yi + P3)’3 H 1- p„y n 

= Pk}'k 


= PkglXi 
_ PkS 

“11 1 
— + — + ■•• + — 
g\ gl gk - 1 


(18) 


Equation (18) determines Yldk in terms of the known growth and economic parameters 
for any k = 2, 3, , n. Thus, the optimal sustainable yield is found as follows. 


Finding the Optimal Sustainable Yield 

The optimal sustainable yield is the largest value of 

PkS 

1 1 1 

— + — + ••■+ 

gl g2 gk - 1 

for k = 2, 3, ... , n. The corresponding value of k is the number of the class that is 
completely harvested. 


In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable 
yield is 

1/gi 

1 / g2 


S 


1 1 1 

— H 1 1 

gl gl gk - 1 


l/g*-l 

0 

0 


(19) 


0 

Theorem 10.8.2 implies that it is not necessarily the highest-priced class of trees that 
should be totally cropped. The growth parameters g, must also be taken into account 
to determine the optimal sustainable yield. 


! EXAMPLE 1 UsingTheorem 10.8.2 

For a Scots pine forest in Scotland with a growth period of six years, the following 
growth matrix was found (see M. B. Usher, “A Matrix Approach to the Management of 
Renewable Resources, with Special Reference to Selection Forests,” Journal of Applied 
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Ecology, vol. 3, 1966, pp. 355-367): 

".72 0 0 0 0 0 ' 

.28 .69 0 0 0 0 

G _ 0 .31 .75 0 0 0 

0 0 .25 .77 0 0 

0 0 0 .23 .63 0 

_ 0 0 0 0 .37 1.00_ 

Suppose that the prices of trees in the five tallest height classes are 

p 2 = $50, p 3 — $100, p 4 = $150, p 5 = $200, p6 = $250 

Which class should be completely harvested to obtain the optimal sustainable yield, and 
what is that yield? 

Solution From matrix G we have that 

g\ = .28, g 2 = .31, g 3 = .25, g 4 = .23, g s = .37 
Equation (18) then gives 

Yld 2 = 50.?/(.28 _1 ) = 14. Os 

Yld 3 = 100j/(.28 _1 + .31" 1 ) = 14. 7j 

Yld 4 = 150j/(.28 _1 + .31 _1 + .25 _I ) = 13.9^ 

Yld 5 = 200j/(.28 _1 + ,31 _1 + .25 _1 + .23” 1 ) = 13.2.? 

Yld 6 = 250x/(.28 _1 + .31" 1 + .25” 1 + .23” 1 + .37" 1 ) = 14.0s 

We see that Yld 3 is the largest of these five quantities, so from Theorem 10.8.2 the third 
class should be completely harvested every six years to maximize the sustainable yield. 
The corresponding optimal sustainable yield is $14. 7s, where s is the total number of 
trees in the forest. 


Exercise Set 10.8 

1. A certain forest is divided into three height classes and has a 
growth matrix between harvests given by 


G = 


0 

1 
3 

2 
3 


0 

0 

1 


4. Derive Equation ( 19) for the nonharvest vector x correspond- 
ing to the optimal sustainable harvesting policy described in 
Theorem 10.8.2. 

5. For the optimal sustainable harvesting policy described in The- 
orem 10.8.2, how many trees are removed from the forest during 
each harvest? 


If the price of trees in the second class is $30 and the price of 
trees in the third class is $50, which class should be completely 
harvested to attain the optimal sustainable yield? What is the 
optimal yield if there are 1000 trees in the forest? 


6. If all the growth parameters gi, g 2 , ■ . . , g„-i in the growth 
matrix G are equal, what should the ratio of the prices 
p 2 . py ■ ■ ■ '■ p n be in order that any sustainable harvesting policy 
be an optimal sustainable harvesting policy? (See Exercise 3.) 


2. In Example 1 , to what level must the price of trees in the fifth 
class rise so that the fifth class is the one to harvest completely 
in order to attain the optimal sustainable yield? 

3. In Example 1, what must the ratio of the prices p 2 : py. py py. pi, 
be in order that the yields Yld k ,k = 2, 3, 4, 5, 6, all be the same? 
(In this case, any sustainable harvesting policy will produce the 
same optimal sustainable yield.) 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mcithematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
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of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


(e) Compare the values of k determined in parts (b) and (c) to 
1/(2 — p), and use some calculus to explain why 



Tl. A particular forest has growth parameters given by 

1 

gi = T 
i 

for i = 1, 2, 3, — 1, where n (the total number of height 

classes) can be chosen as large as needed. Suppose that the value 
of a tree in the kth height interval is given by 

Pk = a{k - l)^ 

where a is a constant (in dollars) and p is a parameter satisfying 

1 < p < 2. 

(a) Show that the yield Yldk is given by 

2 a(k - l)"- 1 * 

Yld k = — 

k 

(b) For 

p = 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 

use a computer to determine the class number that should be 
completely harvested, and determine the optimal sustainable 
yield in each case. Make sure that you allow k to take on only 
integer values in your calculations. 

(c) Repeat the calculations in part (b) using 

p = 1.91, 1.92, 1.93, 1.94, 1.95, 

1.96, 1.97, 1.98, 1.99 

(d) Show that if p = 2, then the optimal sustainable yield can 
never be larger than 2a s. 


T2. A particular forest has growth parameters given by 

1 



for i — 1, 2, 3 n — 1, where n (the total number of height 

classes) can be chosen as large as needed. Suppose that the value 
of a tree in the kth height interval is given by 


p k = a(k - l) p 


where a is a constant (in dollars) and p is a parameter satisfying 
1 < P- 

(a) Show that the yield Y ld k is given by 


Yld k 


a(k — l) p s 
2 * - 2 


(b) For 

p=l, 2, 3, 4, 5, 6, 7, 8, 9, 10 
use a computer to determine the class number that should be 
completely harvested in order to obtain an optimal yield, and 
determine the optimal sustainable yield in each case. Make 
sure that you allow k to take on only integer values in your 
calculations. 


(c) Compare the values ofk determined in part (b) to 1 +p/ln(2) 
and use some calculus to explain why 


k~ 1 + 


P 

ln(2) 


10.9 Computer Graphics 

In this section we assume that a view of a three-dimensional object is displayed on a video 
screen and show how matrix algebra can be used to obtain new views of the object by 
rotation, translation, and scaling. 


Matrix Algebra 
Analytic Geometry 


Visualization of a Suppose that we want to visualize a three-dimensional object by displaying various views 
Three-Dimensional Object of it on a video screen. The object we have in mind to display is to be determined by a finite 

number of straight line segments. As an example, consider the truncated right pyramid 
with hexagonal base illustrated in Figure 10.9.1. We first introduce an xyz-coordinate 
system in which to embed the object. As in Figure 10.9.1, we orient the coordinate 
system so that its origin is at the center of the video screen and the xy-plane coincides 
with the plane of the screen. Consequently, an observer will see only the projection of 
the view of the three-dimensional object onto the two-dimensional xy-plane. 
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► Figure 10.9.1 



▲ View 1 


In the xyz-coordinate system, the endpoints P\, P 2 , . . . , P„ of the straight line seg- 
ments that determine the view of the object will have certain coordinates — say, 


(xi,yi,zi), (x 2 ,y2,Z2),---, ( x„,y n ,z n ) 


These coordinates, together with a specification of which pairs are to be connected by 
straight line segments, are to be stored in the memory of the video display system. For 
example, assume that the 12 vertices of the truncated pyramid in Figure 10.9.1 have the 
following coordinates (the screen is 4 units wide by 3 units high): 


Pi: (1.000, -.800, .000), 
P 3 :(--500, -.800, -.866), 
P 5 : (-.500, -.800, .866), 
P 7 : (.840, -.400, .000), 

P 9 : (-.210, .650, -.364), 
Pu: (—.210, .650, .364), 


P 2 : (.500, -.800, -.866), 
P 4 : (—1.000, —.800, .000), 
Pe'. (.500, —.800, .866), 

Pg: (.315, .125, —.546), 
P 10 : (-.360, .800, .000), 
P\ 2 : (.315, .125, .546) 


These 12 vertices are connected pairwise by 18 straight line segments as follows, where 
Pi -o- Pj denotes that point P t is connected to point Pj : 


Pi 


Pi, 

Pi 


Pi, 

Pi 


Pa, 

P 4 


Ps, 

P 5 


p 6 , 

Pe 


Pi, 

Pi 


Pi, 

Pi 


P 9 , 

P 9 


Pio, 

Pi 0 


Pu, 

Pu 


Pu, 

Pu 

-o- 

Pi, 

Pi 


Pi, 

Pi 

-O- 

Ph, 

Pi 


P 9 , 

Pa 


Pi 0 , 

Ps 


Pu, 

P 6 


Pu 


In View 1 these 18 straight line segments are shown as they would appear on the video 
screen. It should be noticed that only the x- and y-coordinates of the vertices are needed 
by the video display system to draw the view, because only the projection of the object 
onto the xy-plane is displayed. However, we must keep track of the z-coordinates to 
carry out certain transformations discussed later. 

We now show how to form new views of the object by scaling, translating, or rotating 
the initial view. We first construct a 3 x n matrix P, referred to as the coordinate matrix 
of the view, whose columns are the coordinates of the n points of a view: 


Xi 

x 2 ■ 

■ ■ x„ 

y 1 

yi ■ 

■■ yn 

Zl 

Zl ■ 

Zn 


P = 
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For example, the coordinate matrix P corresponding to View 1 is the 3x12 matrix 


1.000 

.500 

-.500 

-1.000 

-.500 

.500 

.840 

.315 

-.210 

-.360 

-.210 

.315 

-.800 

-.800 

-.800 

-.800 

-.800 

-.800 

-.400 

.125 

.650 

.800 

.650 

.125 

.000 

-.866 

-.866 

.000 

.866 

.866 

.000 

-.546 

-.364 

.000 

.364 

.546 


We will show below how to transform the coordinate matrix P of a view to a new 
coordinate matrix P' corresponding to a new view of the object. The straight line segments 
connecting the various points move with the points as they are transformed. In this way, 
each view is uniquely determined by its coordinate matrix once we have specified which 
pairs of points in the original view are to be connected by straight lines. 


Scaling 



The first type of transformation we consider consists of scaling a view along the x, 
y, and z directions by factors of a, p, and y, respectively. By this we mean that if a 
point Pi has coordinates (x,- , y, , Zi) in the original view, it is to move to a new point P[ 
with coordinates (ax,-, P y,-, yzi) in the new view. This has the effect of transforming a 
unit cube in the original view to a rectangular parallelepiped of dimensions a x p x y 
(Figure 10.9.2). Mathematically, this may be accomplished with matrix multiplication 
as follows. Define a 3 x 3 diagonal matrix 


a 


S = 


0 

0 


0 

P 

0 


0 

0 

Y 


Then, if a point P, in the original view is represented by the column vector 



▲ Figure 10.9.2 


Xi 

yi 

Zi 

then the transformed point Pi is represented by the column vector 




1 

Q 

o 

o 

1 


x, 

y'i 

= 

o 

o 


yi 

z'i_ 


1 

X 

o 

O | 


Zi 


Using the coordinate matrix P, which contains the coordinates of all n points of the 
original view as its columns, we can transform these n points simultaneously to produce 
the coordinate matrix P' of the scaled view, as follows: 



▲ View 2 View 1 scaled by 
a = 1.8, 0 = 0.5, y = 3.0. 



a 

0 

0 


Xi 

X2 

■■■ X n 

SP = 

0 

P 

0 


yi 

yi 

■■■ y n 


0 

0 

Y_ 


Zl 

Zl 

Zn 


ax\ 


OCX 2 


ax„ 


= 

Pyi 


Pyi 


Py n 

= p' 


yz i 


yzi 


YZ„ 



The new coordinate matrix can then be entered into the video display system to produce 
the new view of the object. As an example, View 2 is View 1 scaled by setting a = 1.8, 
P — 0.5, and y = 3.0. Note that the scaling y — 3.0 along the z-axis is not visible in 
View 2, since we see only the projection of the object onto the xy-plane. 
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Translation We next consider the transformation of translating or displacing an object to a new 
position on the screen. Referring to Figure 10.9.3, suppose we desire to change an 
existing view so that each point / J , with coordinates (x*, yt, z,) moves to a new point P[ 
with coordinates (x, + xo, y,- + yo, Zi + Zo)- The vector 

x 0 

yo 

_Zo 

is called the translation vector of the transformation. By defining a 3 x n matrix T as 


Xo 

x 0 • 

• • X 0 

yo 

yo ■ 

■■ yo 

Zo 

ZO • 

■ • zo 



we can translate all n points of the view determined by the coordinate matrix P by matrix 
addition via the equation 

P' = P + T 

The coordinate matrix P' then specifies the new coordinates of the n points. For example, 
if we wish to translate View 1 according to the translation vector 

~ 1 . 2 ~ 

0.4 

1.7 


▲ View 3 View 1 translated by the result is View 3. Note, again, that the translation zo = 1-7 along the z-axis does not 
x ° “ : Zo 1 show up explicitly in View 3. 

In Exercise 7, a technique of performing translations by matrix multiplication rather 
than by matrix addition is explained. 


► Figure 10.9.3 



Rotation 



A more complicated type of transformation is a rotation of a view about one of the three 
coordinate axes. We begin with a rotation about the z-axis (the axis perpendicular to 
the screen) through an angle 9. Given a point P ,■ in the original view with coordinates 
(x ; , y,-, Zi), we wish to compute the new coordinates (x-, y\ , z') of the rotated point P[. 
Referring to Figure 10.9.4 and using a little trigonometry, you should be able to derive 
the following: 

xj = p cos(</> + 9) — p cos <p cos 9 — p sin (p sin 9 = x,- cos 9 — y t sin 9 
y' = p sin(0 + 9) — p cos </> sin 9 + p sin (p cos 9 = x; sin 9 + y t cos 9 


A Figure 10.9.4 


Zi = Zi 
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These equations can be written in matrix form as 


< 


cos 6 — sin 0 0 


Xi 

y'i 

= 

sin 9 cos 6 0 


Vi 

A 


0 0 1 


Zi 


If we let R denote the 3 x 3 matrix in this equation, all n points can be rotated by the 
matrix product 

P' = RP 


to yield the coordinate matrix P' of the rotated view. 

Rotations about the x- and y-axes can be accomplished analogously, and the resulting 
rotation matrices are given with Views 4, 5, and 6. These three new views of the truncated 
pyramid correspond to rotations of View 1 about the x-, >'-, and z-axes, respectively, each 
through an angle of 90°. 



A View 4 View 1 rotated 90° about the x-axis. 



▲ View 5 View 1 rotated 90° about the y-axis. 



▲ View 6 View 1 rotated 90° about the z-axis. 
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▲ View 7 Oblique view of 
truncated pyramid. 


Rotations about three coordinate axes may be combined to give oblique views of 
an object. For example. View 7 is View 1 rotated first about the x-axis through 30°, 
then about the y-axis through —70°, and finally about the z-axis through —21°. Mathe- 
matically. these three successive rotations can be embodied in the single transformation 
equation P' = RP, where R is the product of three individual rotation matrices: 


R i 


1 0 0 
0 cos(30°) — sin(30°) 
0 sin (30°) cos(30°) 


Ri 


■ cos(— 70°) 0 sin(— 70°)” 

0 1 0 

sin(— 70°) 0 cos(— 70°)_ 


in the order 


R 3 


cos(— 27°) - sin(— 27°) 0 

sin(— 27°) cos(— 27°) 0 
0 0 1 


R — R3R2R1 


.305 -.025 -.952 

-.155 .985 -.076 

.940 .171 .296 


As a final illustration, in View 8 we have two separate views of the truncated pyramid, 
which constitute a stereoscopic pair. They were produced by first rotating View 7 about 
the y-axis through an angle of —3° and translating it to the right, then rotating the 
same View 7 about the y-axis through an angle of +3° and translating it to the left. 
The translation distances were chosen so that the stereoscopic views are about 2 3 inches 
apart — the approximate distance between a pair of eyes. 



▲ View 8 Stereoscopic figure of truncated pyramid. The three-dimensionality of the 
diagram can be seen by holding the book about one foot away and focusing on a distant 
object. Then by shifting your gaze to View 8 without refocusing, you can make the two 
views of the stereoscopic pair merge together and produce the desired effect. 



10.9 Computer Graphics 601 


Exercise Set 10.9 

1. View 9 is a view of a square with vertices (0. 0, 0), (1. 0, 0), 
(1, 1,0), and (0, 1.0). 

(a) What is the coordinate matrix of View 9? 

(b) What is the coordinate matrix of View 9 after it is scaled 
by a factor 1 f in the x -direction and | in the y-direction? 
Draw a sketch of the scaled view. 

(c) What is the coordinate matrix of View 9 after it is trans- 
lated by the following vector? 

'- 2 " 

— 1 
3_ 

Draw a sketch of the translated view. 

(d) What is the coordinate matrix of View 9 after it is rotated 
through an angle of —30° about the z-axis? Draw a sketch 
of the rotated view. 

-2 -1 0 1 2 



A View 9 Square with vertices 
(0, 0,0), (1, 0, 0), (1, 1, 0), and 
(0, 1, 0) (Exercises 1 and 2). 

2. (a) If the coordinate matrix of View 9 is multiplied by the 
matrix 

'1 \ o' 

0 1 0 

0 0 1_ 

the result is the coordinate matrix of View 10. Such a trans- 
formation is called a shear in the x-direction with factor f 
with respect to the y-coordinate. Show that under such a 
transformation, a point with coordinates (x, , y,-, zf) has 
new coordinates (x, + ly,-, y,-, z,). 

(b) What are the coordinates of the four vertices of the shear 
square in View 10? 

- 2-1 0 1 2 



A View 10 View 9 sheared 
along the x-axis by | with 
respect to the v-coordinate 
(Exercise 2). 


(c) The matrix 

' 1 0 O' 

.6 1 0 

. 0 0 1 _ 

determines a shear in the y-direction with factor .6 with re- 
spect to the x-coordinate (an example appears in View 1 1 ). 
Sketch a view of the square in View 9 after such a shearing 
transformation, and find the new coordinates of its four 
vertices. 


- 2-1012 



A View 11 View 1 sheared 
along the y-axis by .6 with 
respect to the x-coordinate 
(Exercise 2). 

3. (a) The reflection about the xz-plane is defined as the trans- 
formation that takes a point (x;,y;,z,) to the point 
(x;, — y,- , z.i) (e.g.. View 12). If P and P' are the coor- 
dinate matrices of a view and its reflection about the xz- 
plane, respectively, find a matrix M such that P' = MP. 

(b) Analogous to part (a), define the reflection about the yz- 
plane and construct the corresponding transformation 
matrix. Draw a sketch of View 1 reflected about the yz- 
plane. 

(c) Analogous to part (a), define the reflection about the xy- 
plane and construct the corresponding transformation 
matrix. Draw a sketch of View 1 reflected about the xy- 
plane. 



A View 12 View 1 reflected 
about the xz-plane (Exercise 3). 

4. (a) View 13 is View 1 subject to the following five transforma- 
tions: 

1 . Scale by a factor of \ in the x-direction, 2 in the y- 
direction, and | in the z-direction. 
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2. Translate | unit in the x-direction. 

3. Rotate 20° about the x-axis. 

4. Rotate —45° about the y-axis. 

5. Rotate 90° about the z-axis. 

Construct the five matrices Mi, M 2 , M 3 , M 4 , and M 5 
associated with these five transformations. 

(b) If P is the coordinate matrix of View 1 and P' is the coor- 
dinate matrix of View 13, express P' in terms of Mi, Mi, 
M 3 , M 4 , M5, and P . 



▲ View 13 View 1 scaled, 
translated, and rotated 
(Exercise 4). 

5. (a) View 14 is View 1 subject to the following seven transfor- 
mations: 

1. Scale by a factor of .3 in the x-direction and by a 
factor of .5 in the y-direction. 

2. Rotate 45° about the x-axis. 

3. Translate 1 unit in the x-direction. 

4. Rotate 35° about the v-axis. 

5. Rotate —45° about the z-axis. 

6. Translate 1 unit in the z-direction. 

7. Scale by a factor of 2 in the x-direction. 

Construct the matrices M\, M 2 , . . . , M 7 associated with 
these seven transformations. 

(b) If P is the coordinate matrix of View 1 and P' is the 
coordinate matrix of View 14, express P' in terms of 
Mi, Mi, . . . , M 7 , and P . 



▲ View 14 View 1 scaled, 
translated, and rotated 
(Exercise 5). 


6. Suppose that a view with coordinate matrix P is to be rotated 
through an angle 6 about an axis through the origin and spec- 
ified by two angles a and f) (see Figure Ex-6). If P' is the 
coordinate matrix of the rotated view, find rotation matrices 
if 1, R 2 , R 2 , R 4 , and if 5 such that 

P' = R 5 R 4 R 2 R 2 RiP 

[Hint: The desired rotation can be accomplished in the fol- 
lowing five steps: 

1. Rotate through an angle of /S about the y-axis. 

2. Rotate through an angle of ot about the z-axis. 

3. Rotate through an angle of 0 about the y-axis. 

4. Rotate through an angle of —a about the z-axis. 

5. Rotate through an angle of —fi about the y-axis.] 


M Figure Ex-6 



7. This exercise illustrates a technique for translating a point with 
coordinates (x, , y;, Zi) to a point with coordinates (x,- + xo, 
y ; + y 0 , z, + Zo) by matrix multiplication rather than matrix 
addition. 

(a) Let the point (x,-, y ; ,z f ) be associated with the column 
vector 



and let the point (x, + x 0 , y, + yo, Z; + Zo) be associated 
with the column vector 


X,' + Xo 

yt + yo 

Zi + Zo 

1 


Find a 4 x 4 matrix M such that v] = Mv, . 

(b) Find the specific 4x4 matrix of the above form that 
will effect the translation of the point (4, —2, 3) to the 
point (—1,7, 0). 


8. For the three rotation matrices given with Views 4, 5, and 6, 
show that 

if” 1 = R t 

(A matrix with this property is called an orthogonal matrix. 
See Section 7.1.) 




10.10 Equilibrium Temperature Distributions 603 


Working withTechnology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. Let (a, b.c) be a unit vector normal to the plane ax + by + 
cz = 0, and let r = (x, y, z) be a vector. It can be shown that the 
mirror image of the vector r through the above plane has coordi- 
nates r,„ = (x m ,y m ,z m ), where 


Xm 


X 

ym 

= M 

y 

Zm 


z 


with 



"l 

0 

o' 


a 

M = I - 2nn r = 

0 

1 

0 

- 2 

b 


0 

0 

1 


c 


[a b c] 


(a) Show that M 2 = 1 and give a physical reason why this must 
be so. [Hint: Use the fact that (a, b , c) is a unit vector to show 
that n r n = Id 


(b) Use a computer to show that det(M) = — 1. 


(c) The eigenvectors of M satisfy the equation 


Xm 


X 


X 

y m 

= M 

y 

= k 

y 

Zm 


z 


z 


and therefore correspond to those vectors whose direction is 
not affected by a reflection through the plane. Use a computer 
to determine the eigenvectors and eigenvalues of M, and then 
give a physical argument to support your answer. 

T2. A vector v = (x, y, z) is rotated by an angle 9 about an axis 
having unit vector (a, b, c), thereby forming the rotated vector 
y R = (x R , y R , z R ). It can be shown that 


Xr 


X 

yR 

= R(0) 

y 

Zr 


z 


with 



"l 

0 

o' 


a 

R(9) = cos( 9) 

0 

1 

0 

+ (1 - co&{9)) 

b 


0 

0 

1 


c 


0 


+ sin (6) 



—c 

0 


a 


b 

—a 

0 


(a) Useacomputertoshowthatf?(#)I?((p) = R{9 + cp), andthen 
give a physical reason why this must be so. Depending on the 
sophistication of the computer you are using, you may have to 
experiment using different values of a, b , and 

c = V 1 — a 2 — b 2 

(b) Show also that R~ l (6) = R(—9) and give a physical reason 
why this must be so. 

(c) Use a computer to show that det(f?(0)) = +1. 


10.10 EquilibriumTemperature Distributions 

In this section we will see that the equilibrium temperature distribution within a trapezoidal 
plate can be found when the temperatures around the edges of the plate are specified. The 
problem is reduced to solving a system of linear equations. Also, an iterative technique for 
solving the problem and a “random walk” approach to the problem are described. 

Linear Systems 
Matrices 

Intuitive Understanding of Limits 


Boundary Data Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.10.1a are 
insulated from heat. Suppose that we are also given the temperature along the four edges 
of the plate. For example, let the temperature be constant on each edge with values of 
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The Mean-Value Property 



)c 


▲ Figure 10.10.2 


0°, 0°, 1°, and 2°, as in the figure. After a period of time, the temperature inside the 
plate will stabilize. Our objective in this section is to determine this equilibrium temper- 
ature distribution at the points inside the plate. As we will see, the interior equilibrium 
temperature is completely determined by the boundary data — that is, the temperature 
along the edges of the plate. 


► Figure 10.10.1 




The equilibrium temperature distribution can be visualized by the use of curves that 
connect points of equal temperature. Such curves are called isotherms of the temperature 
distribution. In Figure 10. 10. lb we have sketched a few isotherms, using information we 
derive later in the chapter. 

Although all our calculations will be for the trapezoidal plate illustrated, our tech- 
niques generalize easily to a plate of any practical shape. They also generalize to the 
problem of finding the temperature within a three-dimensional body. In fact, our “plate” 
could be the cross section of some solid object if the flow of heat perpendicular to the 
cross section is negligible. For example, Figure 10.10.1 could represent the cross section 
of a long dam. The dam is exposed to three different temperatures: the temperature of 
the ground at its base, the temperature of the water on one side, and the temperature of 
the air on the other side. A knowledge of the temperature distribution inside the dam is 
necessary to determine the thermal stresses to which it is subjected. 

Next we will consider a certain thermodynamic principle that characterizes the tem- 
perature distribution we are seeking. 


There are many different ways to obtain a mathematical model for our problem. The 
approach we use is based on the following property of equilibrium temperature distri- 
butions. 


The Mean-Value Property 

Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is 
any circle with center at P that is completely contained in the plate, the temperature at 
P is the average value of the temperature on the circle ( Figure 10.10.2). 


This property is a consequence of certain basic laws of molecular motion, and we will not 
attempt to derive it. Basically, this property states that in equilibrium, thermal energy 
tends to distribute itself as evenly as possible consistent with the boundary conditions. 
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Discrete Formulation of the 
Problem 


It can be shown that the mean-value property uniquely determines the equilibrium tem- 
perature distribution of a plate. 

Unfortunately, determining the equilibrium temperature distribution from the mean- 
value property is not an easy matter. However, if we restrict ourselves to finding the 
temperature only at a finite set of points within the plate, the problem can be reduced to 
solving a linear system. We pursue this idea next. 


We can overlay our trapezoidal plate with a succession of finer and finer square nets or 
meshes (Figure 10. 10.3). In (a) we have a rather coarse net; in (ft) we have a net with half 
the spacing as in (a); and in (c) we have a net with the spacing again reduced by half. 
The points of intersection of the net lines are called mesh points. We classify them as 
boundary mesh points if they fall on the boundary of the plate or as interior mesh points 
if they lie in the interior of the plate. For the three net spacings we have chosen, there 
are 1 , 9, and 49 interior mesh points, respectively. 



( a ) 1 interior mesh point 
A Figure 10.10.3 



(ft) 9 interior mesh points 



{(:) 49 interior mesh points 


0 

0 

0 

0 


In the discrete formulation of our problem, we try to find the temperature only at 
the interior mesh points of some particular net. For a rather fine net, as in (c), this will 
provide an excellent picture of the temperature distribution throughout the entire plate. 

At the boundary mesh points, the temperature is given by the boundary data. (In 
Figure 10.10.3 we have labeled all the boundary mesh points with their corresponding 
temperatures.) At the interior mesh points, we will apply the following discrete version 
of the mean-value property. 


Discrete Mean-Value Property 

At each interior mesh point , the temperature is approximately the average of the tem- 
peratures at the four neighboring mesh points. 


This discrete version is a reasonable approximation to the true mean-value property. But 
because it is only an approximation, it will provide only an approximation to the true 
temperatures at the interior mesh points. However, the approximations will get better as 
the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the approxi- 
mations approach the exact temperature distribution, a fact proved in advanced courses 
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in numerical analysis. We will illustrate this convergence by computing the approximate 
temperatures at the mesh points for the three mesh spacings given in Figure 10.10.3. 

Case (a) of Figure 10.10.3 is simple, for there is only one interior mesh point. If we let 
to be the temperature at this mesh point, the discrete mean-value property immediately 
gives 

t 0 = | (2 +1+0 + 0) = 0.75 

In case (b) we can label the temperatures at the nine interior mesh points t \ , t 2 , . . . , tg, 
as in Figure 10.10.3Z?. (The particular ordering is not important.) By applying the 
discrete mean-value property successively to each of these nine mesh points, we obtain 
the following nine equations: 


t\ : 
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t = M t + b 
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To solve Equation (2), we write it as 


(/ - M) t = b 


The solution for t is thus 


t= (/ - My 1 b 


(3) 


as long as the matrix (/ — M) is invertible. This is indeed the case, and the solution for 
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t as calculated by (3) is 


t = 


0.7846 

1.1383 

0.4719 

1.2967 

0.7491 

0.3265 

1.2995 

0.9014 

0.5570 


(4) 


Figure 10.10.4 is a diagram of the plate with the nine interior mesh points labeled with 
their temperatures as given by this solution. 


► Figure 10.10.4 



For case (c) of Figure 10.10.3, we repeat this same procedure. We label the temper- 
atures at the 49 interior mesh points as t\, ? 2 , ■ . ■ , t& in some manner. For example, we 
may begin at the top of the plate and proceed from left to right along each row of mesh 
points. Applying the discrete mean-value property to each mesh point gives a system of 
49 linear equations in 49 unknowns: 

h = \ih + 2 + 0 + 0 ) 
h = \(ti + h + U + 2) 

: (5) 

? 48 = 4(^41 + ?47 + ?49 +1) 

r 49 = \ (742 + ?48 + 0 + 1 ) 

In matrix form. Equations (5) are 

t = Mi + b 

where t and b are column vectors with 49 entries, and M is a 49 x 49 matrix. As in (3), 
the solution for t is 

t= (I — M) -1 b (6) 

In Figure 10.10.5 we display the temperatures at the 49 mesh points found by Equa- 
tion (6). The nine unshaded temperatures in this figure fall on the mesh points of 
Figure 10.10.4. In Table 1 we compare the temperatures at these nine common mesh 
points for the three different mesh spacings used. 
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Table 1 



Temperatures at Common 
Mesh Points 


Case (a) 

Case ( b ) 

Case (c) 

h 

— 

0.7846 

0.8048 

h 

— 

1.1383 

1.1533 

h 

— 

0.4719 

0.4778 

u 

— 

1.2967 

1.3078 

h 

0.7500 

0.7491 

0.7513 

k 

— 

0.3265 

0.3157 

h 

— 

1.2995 

1.3042 

h 

— 

0.9014 

0.9032 

t 9 

— 

0.5570 

0.5554 


0 

0 

0 

0 
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A Numerical Technique 


Knowing that the temperatures of the discrete problem approach the exact tempera- 
tures as the mesh spacing decreases, we may surmise that the nine temperatures obtained 
in case (c) are closer to the exact values than those in case (b). 


To obtain the 49 temperatures in case (c) of Figure 10.10.3, it was necessary to solve a 
linear system with 49 unknowns. A finer net might involve a linear system with hundreds 
or even thousands of unknowns. Exact algorithms for the solutions of such large systems 
are impractical, and for this reason we now discuss a numerical technique for the practical 
solution of these systems. 

To describe this technique, we look again at Equation (2): 

t = Mt + b (7) 


The vector t we are seeking appears on both sides of this equation. We consider a way 
of generating better and better approximations to the vector solution t. For the initial 
approximation t (0) we can take t (0) = 0 if no better choice is available. If we substitute 
t ,0) into the right side of (7) and label the resulting left side as t (1) , we have 

t (1) = Mt (0) + b (8) 


If we substitute t (1) into the right side of (7), we generate another approximation, which 
we label t (2) : 

t (2) = Mt (1) + b (9) 

Continuing in this way, we generate a sequence of approximations as follows: 


t d) = Mt m + b 

t (2 > = Mt (1) + b 
t< 3 > = Mt (2) + b 

t (n) = + b 


One would hope that this sequence of approximations t ,0 ^ t (1 \ t (2) , . . . converges to 
the exact solution of (7). We do not have the space here to go into the theoretical 
considerations necessary to show this. Suffice it to say that for the particular problem 
we are considering, the sequence converges to the exact solution for any mesh size and 
for any initial approximation t (0) . 

This technique of generating successive approximations to the solution of (7) is a 
variation of a technique called Jacobi iteration ; the approximations themselves are called 
iterates. As a numerical example, let us apply Jacobi iteration to the calculation of the 
nine mesh point temperatures of case ( b ). Setting t (0) = 0, we have, from Equation (2), 

.5000" 
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t (I) = M t (0) + b = M0 + b = b = 
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t (2) = M t (I) + b 
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L - 1 


Some additional iterates are 


0.6875 


0.7791 


0.7845 


0.7846 

0.8906 


1.1230 


1.1380 


1.1383 

0.2344 


0.4573 


0.4716 


0.4719 

0.9688 


1.2770 


1.2963 


1.2967 

0.3750 

4.(10) _ 

) 1 

0.7236 

t (20) = 

0.7486 

t (30) = 

0.7491 

0.1250 


0.3131 


0.3263 


0.3265 

1.0781 


1.2848 


1.2992 


1.2995 

0.6094 


0.8827 


0.9010 


0.9014 

0.3906 


0.5446 


0.5567 


0.5570 


All iterates beginning with the thirtieth are equal to t (30) to four decimal places. Conse- 
quently, t (30) is the exact solution to four decimal places. This agrees with our previous 
result given in Equation (4). 

The Jacobi iteration scheme applied to the linear system (5) with 49 unknowns pro- 
duces iterates that begin repeating to four decimal places after 1 19 iterations. Thus, t <119> 
would provide the 49 temperatures of case ( c ) correct to four decimal places. 


A Monte Carlo Technique 



▲ Figure 10.10.6 


In this section we describe a so-called Monte Carlo technique for computing the temper- 
ature at a single interior mesh point of the discrete problem without having to compute 
the temperatures at the remaining interior mesh points. First we define a discrete random 
walk along the net. By this we mean a directed path along the net lines (Figure 10.10.6) 
that joins a succession of mesh points such that the direction of departure from each 
mesh point is chosen at random. Each of the four possible directions of departure from 
each mesh point along the path is to be equally probable. 

By the use of random walks, we can compute the temperature at a specified interior 
mesh point on the basis of the following property. 


Random Walk Property 

Let W\, Wi, . ■ ■ , W n be a succession of random walks, all of which begin at a specified 
interior mesh point. Let t* ,tf ... ,t* be the temperatures at the boundary mesh 
points first encountered along each of these random walks. Then the average value 
(t* + + • ' ' + Kfi/n °f these boundary temperatures approaches the temperature at 

the specified interior mesh point as the number of random walks n increases without 
bound. 
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This property is a consequence of the discrete mean-value property that the mesh point 
temperatures satisfy. The proof of the random walk property involves elementary con- 
cepts from probability theory, and we will not give it here. 

In Table 2 we display the results of a large number of computer-generated random 
walks for the evaluation of the temperature ts of the nine-point mesh of case (b) in Fig- 
ure 10.10.6. The first column lists the number n of the random walk. The second column 
lists the temperature t* of the boundary point first encountered along the corresponding 
random walk. The last column contains the cumulative average of the boundary tem- 
peratures encountered along the n random walks. Thus, after 1000 random walks we 
have the approximation t 5 ~ .7550. This compares with the exact value t 5 = .7491 that 
we had previously evaluated. As can be seen, the convergence to the exact value is not 
too rapid. 


Table 2 


n 

t*n 

(A + + t*„)/n 


11 

An 

(fi + •■■ + A,)/ n 

i 

1 

1.0000 


20 

1 

0.9500 

2 

2 

1.5000 


30 

0 

0.8000 

3 

1 

1.3333 


40 

0 

0.8250 

4 

0 

1.0000 


50 

2 

0.8400 

5 

2 

1.2000 


100 

0 

0.8300 

6 

0 

1.0000 


150 

1 

0.8000 

7 

2 

1.1429 


200 

0 

0.8050 

8 

0 

1.0000 


250 

1 

0.8240 

9 

2 

1.1111 


500 

1 

0.7860 

10 

0 

1.0000 


1000 

0 

0.7550 


Exercise Set 10.10 

1. A plate in the form of a circular disk has boundary tempera- 
tures of 0° on the left of its circumference and 1° on the right 
half of its circumference. A net with four interior mesh points 
is overlaid on the disk (see Figure Ex-1). 

(a) Using the discrete mean-value property, write the 4x4 
linear system t = MX + b that determines the approximate 
temperatures at the four interior mesh points. 

(b) Solve the linear system in part (a). 

(c) Use the Jacobi iteration scheme with t (0) = 0 to generate 
the iterates t (1) , t (2) , t (3) , t <4) , and t (5) for the linear system in 
part (a). What is the "error vector” t (5) — t, where t is the 
solution found in part (b)? 

(d) By certain advanced methods, it can be determined that the 
exact temperatures to four decimal places at the four mesh 
points are q = f 3 = .2871 and t 2 = tn = .7129. What are 
the percentage errors in the values found in part (b)? 



2. Use Theorem 10. 10. 1 to find the exact equilibrium temperature 
at the center of the disk in Exercise 1 . 

3. Calculate the first two iterates t* 11 and t <2) for case (b) of Figure 
10.10.3 with nine interior mesh points [Equation (2)] when the 
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initial iterate is chosen as 

t (0) = [1 1 1 1 1 1 1 1 l] r 

4. The random walk illustrated in Figure Ex-4a can be described 
by six arrows 

that specify the directions of departure from the successive 
mesh points along the path. Figure Ex-4/? is an array of 100 
computer-generated, randomly oriented arrows arranged in a 
10 x 10 array. Use these arrows to determine random walks 
to approximate the temperature f 5 , as in Table 2. Proceed as 
follows: 

1. Take the last two digits of your telephone number. Use 
the last digit to specify a row and the other to specify a 
column. 

2. Go to the arrow in the array with that row and column 
number. 

3. Using this arrow as a starting point, move through the ar- 
ray of arrows as you would read a book (left to right and top 
to bottom). Beginning at the point labeled t$ in Figure Ex- 
40 and using this sequence of arrows to specify a sequence 
of directions, move from mesh point to mesh point until 
you reach a boundary mesh point. This completes your 
first random walk. Record the temperature at the bound- 
ary mesh point. (If you reach the end of the arrow array, 
continue with the arrow in the upper left corner.) 

4. Return to the interior mesh point labeled and begin 
where you left off in the arrow array; generate your next 
random walk. Repeat this process until you have com- 
pleted 10 random walks and have recorded 10 boundary 
temperatures. 

5. Calculate the average of the 10 boundary temperatures 
recorded. (The exact value is t 5 = .7491.) 
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Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 


capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. Suppose that we have the square region described by 
Tl - {( x , y) | 0 < jc < 1, 0 < y < 1} 


and suppose that the equilibrium temperature distribution u (x , y) 
along the boundary is given by u(x, 0) = T B , u(x , 1) = T t , 
u( 0, y) = T l , and u( 1, y) = T R . Suppose next that this region 
is partitioned into an (/? + 1) x (n + 1) mesh using 

i j 

X( = - and y: = - 
n ' n 

for i = 0, 1 , 2, .... w and j = 0, 1 , 2, . . . , n. If the temperatures 
of the interior mesh points are labeled by 

Utj — u(Xi, V;) = u(i /n, j/n ) 

then show that 

= 4 T Wf+1,7 T W/.j — 1 T Mi,j+ 1) 

for i = 1, 2, 3, . . . , n — 1 and j = 1, 2, 3, . . . , n — 1. To handle 
the boundary points, define 


^0 ,j — T/ , ‘*n,j 


u n i — Hi o — T B , and u j — T R 
for ( = 1, 2, 3, . . . , n — 1 and j = 1, 2, 3, . . . , n — 1. Next let 

’0 I n 


Fn + 1 = 


1 0 


be the (n + 1) x (n + 1) matrix with the n x n identity matrix in 
the upper right-hand comer, a one in the lower left-hand corner, 
and zeros everywhere else. For example, 


Fi = 


F, = 


0 1 


0 0 1 
0 0 0 
10 0 0 


0 0 
0 
1 



0 

1 

0 



0 

0 

1 

, 


1 

0 

0 



0 

1 

0 

0 


0 

0 

1 

0 

F 5 = 

0 

0 

0 

1 


0 

0 

0 

0 


1 

0 

0 

0 

) x (n + 1) matrix 

"o 


’o 


1 

0 

+ 

1 

0 


M„+i — F n+i + F n+1 — 


show that if U „+ 1 is the ( n + 1) x ( n + 1) matrix with entries Ujj, 
then the set of equations 

Mi , j = T Wi+1,7 T Uij—\ T Uij+l) 

for z = 1, 2, 3, . . . , n — 1 and j — 1, 2, 3, . . . , n — 1 can be writ- 
ten as the matrix equation 

£4+t = j(A£„ +1 t 7 n+1 + t/„ +1 M„+i) 

where we consider only those elements of U n+ 1 with 
i = 1, 2, 3, .... n — 1 and j = 1, 2, 3, . . . , n — 1. 


T2. The results of the preceding exercise and the discussion in the 
text suggest the following algorithm for solving for the equilibrium 
temperature in the square region 

U = \(x, y) | 0 < x < 1, 0 < y < 1} 
given the boundary conditions 

u(x, 0) = Tg, u( x, 1) = Tt , 
u(fi,y) = T L , u(\,y) = T R 

1. Choose a value for n, and then choose an initial guess, say 


U (0) - 

U n + I — 


2 . 


0 

T l ■ 

■ T l 

0 

T b 

0 • 

■ ■ 0 

T t 

Tb 

0 • 

• • 0 

Tt 

0 

t r ■ 

Tr 

0 

0, 

1,2,3,.. 

. , compute l 


,(k+ 1 ) . 


= |(M„ +1 [/„® + Ul 
where M„ +1 is as defined in Exercise Tl. Then adjust U^' 
by replacing all edge entries by the initial edge entries in t/® i . 
[Note: The edge entries of a matrix are the entries in the first 
and last columns and first and last rows.] 

Continue this process until U^' 1 — t/®, is approximately 
the zero matrix. This suggests that 

U n+ 1 = lim (7®! 

k — >oo 


Cl M n + l) 


rl.k+1) 
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Use a computer and this algorithm to solve for u (x , y) given that 

u(x, 0) = 0, u(x, 1) = 0, u( 0, y) = 0, u( 1, y) = 2 

Choose n = 6 and compute up to . The exact solution can 
be expressed as 

8 ^ sinh[(2m — \)nx] sin [(2m — l)ny] 

u(x, y) = — > — 

tv (2m — l)sinh[(2m — l)n] 

Use a computer to compute u(i/ 6, j / 6) for i, j — 0, 1, 2, 3, 4, 5, 
6, and then compare your results to the values of u(i/ 6, j / 6) in 

rj0°) 

U n + 1 ■ 

T3. Using the exact solution u (x , y) for the temperature distribu- 
tion described in Exercise T2, use a graphing program to do the 
following: 

(a) Plot the surface z = u(x, y) in three-dimensional jcyz-space 
in which z is the temperature at the point ( x , y) in the square 
region. 

(b) Plot several isotherms of the temperature distribution (curves 
in the xy-plane over which the temperature is a constant). 

(c) Plot several curves of the temperature as a function of x with 
y held constant. 

(d) Plot several curves of the temperature as a function of y with 
x held constant. 


10.11 ComputedTomography 

In this section we will see how constructing a cross-sectional view of a human body by 
analyzing X-ray scans leads to an inconsistent linear system. We present an iteration 
technique that provides an “approximate solution” of the linear system. 


Linear Systems 
Natural Logarithms 
Euclidean Space R n 


The basic problem of computed tomography is to construct an image of a cross section 
of the human body using data collected from many individual beams of X rays that 
are passed through the cross section. These data are processed by a computer, and the 
computed cross section is displayed on a video monitor. Figure 10.1 1.1 is a diagram of 
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▲ Figure 10.11.2 

[Image: Edward Kinsman! Photo 
Researchers! Getty Images ] 


Scanning Modes 


General Electric’s CT system showing a patient prepared to have a cross section of his 
head scanned by X-ray beams. 

Such a system is also known as a CAT scanner , for Computer-/! ided Tomography 
scanner. Figure 10.1 1.2 shows a typical cross section of a human head produced by the 
system. 

The first commercial system of computed tomography for medical use was developed 
in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 1979, Houndsfield and 
A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As 
we will see in this section, the construction of a cross section, or tomograph, requires 
the solution of a large linear system of equations. Certain algorithms, called algebraic 
reconstruction techniques (ARTs), can be used to solve these linear systems, whose 
solutions yield the cross sections in digital form. 


Unlike conventional X-ray pictures that are formed by X rays that are projected per- 
pendicular to the plane of the picture, tomographs are constructed from thousands of 
individual, hairline-thin X-ray beams that lie in the plane of the cross section. After 
they pass through the cross section, the intensities of the X-ray beams are measured by 
an X-ray detector, and these measurements are relayed to a computer where they are 
processed. Figures 10.11.3 and 10.11.4 illustrate two possible modes of scanning the 
cross section: the parallel mode and the fan-beam mode. In the parallel mode a single 
X-ray source and X-ray detector pair are translated across the field of view containing 
the cross section, and many measurements of the parallel beams are recorded. Then 
the source and detector pair are rotated through a small angle, and another set of mea- 
surements is taken. This is repeated until the desired number of beam measurements 
is completed. For example, in the original 1971 machine, 160 parallel measurements 
were taken through 180 angles spaced 1° apart: a total of 160 x 180 = 28,800 beam 
measurements. Each such scan took approximately 5^ minutes. 


X-ray 

detector 



source 

A Figure 10.11.3 Parallel mode. 



In the fan-beam mode of scanning, a single X-ray tube generates a fan of collimated 
beams whose intensities are measured simultaneously by an array of detectors on the 
other side of the field of view. The X-ray tube and detector array are rotated through many 
angles, and a set of measurements is taken at each angle until the scan is completed. In 
the General Electric CT system, which uses the fan-beam mode, each scan takes 1 second. 
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Derivation of Equations 



▲ Figure 10.11.5 


To see how the cross section is reconstructed from the many individual beam measure- 
ments, refer to Figure 10.11.5. Here the field of view in which the cross section is situated 
has been divided into many square pixels (picture elements) numbered 1 through N as 
indicated. It is our desire to determine the X-ray density of each pixel. In the EMI 
system, 6400 pixels were used, arranged in a square 80 x 80 array. The G.E. CT system 
uses 262,144 pixels ina512x512 array, each pixel being about 1 mm on a side. Af- 
ter the densities of the pixels are determined by the method we will describe, they are 
reproduced on a video monitor, with each pixel shaded a level of gray proportional to 
its X-ray density. Because different tissues within the human body have different X-ray 
densities, the video display clearly distinguishes the various tissues and organs within 
the cross section. 

Figure 10.1 1.6 shows a single pixel with an X-ray beam of roughly the same width 
as the pixel passing squarely through it. The photons constituting the X-ray beam are 
absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the 
tissue. Quantitatively, the X-ray density of the j th pixel is denoted by Xj and is defined by 

number of photons entering the jth pixel 
number of photons leaving the jth pixel 

where “In” denotes the natural logarithmic function. Using the logarithm property 
In (a/b) = — In (b/a), we also have 



= - In 


( fraction of photons that pass through\ 
the j th pixel without being absorbed J 


If the X-ray beam passes through an entire row of pixels (Figure 10.11.7), then the 
number of photons leaving one pixel is equal to the number of photons entering the next 
pixel in the row. If the pixels are numbered 1,2 then the additive property of the 
logarithmic function gives 


x\ + x 2 H F x n = In 


/ number of photons entering the first pixel\ 


l 


number of photons leaving the nth pixel 


= - In 


( fraction of photons that pass\ 
through the row of n pixels I 
without being absorbed J 


( 1 ) 


Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual 
pixel densities. 



Photons entering Photons leaving 

y'th pixel jth pixel 



▲ Figure 10.11.7 
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Next, consider the X-ray beam in Figure 10.11.5. By the beam density of the ith 
beam of a scan, denoted by b , , we mean 


/ number of photons of the i th beam entering the detector \ 
without the cross section in the field of view 

number of photons of the i th beam entering the detector 
\ with the cross section in the field of view / 


= - In 


( fraction of photons of the i th beam that 
pass through the cross section without 
being absorbed 


( 2 ) 


The numerator in the first expression for b; is obtained by performing a calibration scan 
without the cross section in the field of view. The resulting detector measurements are 
stored within the computer’s memory. Then a clinical scan is performed with the cross 
section in the field of view, the b{ s of all the beams constituting the scan are computed, 
and the values are stored for further processing. 

For each beam that passes squarely through a row of pixels, we must have 


' fraction of photons of the ^ 
beam that pass through the 
row of pixels without being 
i absorbed y 


( fraction of photons of the ^ 
beam that pass through the 
cross section without being 
i absorbed y 


Thus, if the /th beam passes squarely through a row of n pixels, then it follows from 
Equations (1) and (2) that 

Xi+ x 2 -\ \-x n = bj 

In this equation, b, is known from the clinical and calibration measurements, and 
x\, x 2 , . . . , x n are unknown pixel densities that must be determined. 

More generally, if the ;th beam passes squarely through a row (or column) of pixels 
with numbers j\, ji, ji, then we have 


x h + x h H x h = h 


If we set 


1 , 

0 , 


if j = 

otherwise 


then we can write this equation as 


a;i*i + a i2 x 2 H h a iN x N — £>, 


(3) 


We will refer to Equation (3) as the i th beam equation. 

Referring to Figure 10.11.5, however, we see that the beams of a scan do not neces- 
sarily pass through a row or column of pixels squarely. Instead, a typical beam passes 
diagonally through each pixel in its path. There are many ways to take this into account. 
In Figure 10.1 1.8 we outline three methods of defining the quantities that appear in 
Equation (3), each of which reduces to our previous definition when the beam passes 
squarely through a row or column of pixels. Reading down the figure, each method is 
more exact than its predecessor, but with successively more computational difficulty. 
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Algebraic Reconstruction 
Techniques 


Center-of-Pixel Method 


1 if the it h beam passes 
through the center of 
the yth pixel 


0 otherwise 


zth beam 



yth pixel 


Center Line Method 

f length of the center line A 

Length of 
center line 

V' 



a n= 

of the zth beam that lies 
in the yth pixel 

\ width of the yth pixel J 

V 

Ay; 





-Width of 
pixel 





Area Method 


area of the zth beam that lies in the yth pixel ^ 
area of the zth beam that would lie in the yth pixel 
if the zth beam were to cross the pixel squarely J 


Area in the 



Area in the 
denominator of a y 



A Figure 10.11.8 


Using any one of the three methods to define the a,/s in the ith beam equation, we 
can write the set of M beam equations in a complete scan as 

a\\X\ + ci\2X2 + • • • + o\nXn = b\ 

Q2lXl + U22X2 + • • • + d2N x N = Z?2 

(4) 

om\x 1 + CIM2X2 H — ■ + a MN x N — 

In this way we have a linear system of M equations (the M beam equations) in N 
unknowns (the N pixel densities). 

Depending on the number of beams and pixels used, we can have M > N, M — N, 
or M < N. We will consider only the case M > N, the so-called overdetermined case, 
in which there are more beams in the scan than pixels in the field of view. Because of 
inherent modeling and experimental errors in the problem, we should not expect our 
linear system to have an exact mathematical solution for the pixel densities. In the next 
section we attempt to find an “approximate” solution to this linear system. 

There have been many mathematical algorithms devised to treat the overdetermined 
linear system (4). The one we will describe belongs to the class of so-called Algebraic 
Reconstruction Techniques (ARTs). This method, which can be traced to an iterative 
technique originally introduced by S. Kaczmarz in 1937, was the one used in the first 
commercial machine. To introduce this technique, consider the following system of three 
equations in two unknowns: 

L\\ x\ + X2 = 2 

L.2. x\ — 2x2 — — 2 
L 3 : 3x\ — X 2 = 3 


( 5 ) 
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(«) 

A Figure 10.11.9 




The lines L i, L 2 , L 2 determined by these three equations are plotted in the xi^-plane. 
As shown in Figure 10. 1 1 .9«, the three lines do not have a common intersection, and 
so the three equations do not have an exact solution. However, the points (x\ , x 2 ) on 
the shaded triangle formed by the three lines are all situated “near” these three lines 
and can be thought of as constituting “approximate” solutions to our system. The 
following iterative procedure describes a geometric construction for generating points 
on the boundary of that triangular region (Figure 10.1 1.96): 


Algorithm 1 

Step 0. Choose an arbitrary starting point x 0 in the xixi-plane. 

Step 1. Project x 0 orthogonally onto the first line L\ and call the projection x'/’. The 
superscript ( 1 ) indicates that this is the first of several cycles through the steps. 
Step 2. Project x '/ 1 orthogonally onto the second line L 2 and call the projection x^. 
Step 3. Project x 2 ' J orthogonally onto the third line L 3 and call the projection x!, 1 1 . 

Step 4. Take x^ as the new value of x 0 and cycle through Steps 1 through 3 again. In 

(2) (2) (2) 

the second cycle, label the projected points x) , x 2 , x\ ; in the third cycle, label 
the projected points Xj 3) , x 2 3 \ x 2 3) ; and so forth. 

This algorithm generates three sequences of points 


L i 

L 2 

Lt, 


(i) (2) (3) 

X 1 , Xj , Xj , 

(i) (2) (3) 

a>2 ^ A'} 1 A-) ^ 

V (D v (2) v (3) 

A3 , A3 ^ A3 ^ 


that lie on the three lines L\, L 2 , and L 3 , respectively. It can be shown that as long as 
the three lines are not all parallel, then the first sequence converges to a point x* on L\, 
the second sequence converges to a point x 2 on L 2 , and the third sequence converges to 
a point Xj on L 3 (Figure 10.1 1.9c). These three limit points form what is called the limit 
cycle of the iterative process. It can be shown that the limit cycle is independent of the 
starting point x 0 . 

Next we discuss the specific formulas needed to effect the orthogonal projections in 
Algorithm 1. First, because the equation of a line in xix 2 -space is 


a,\X\ + a 2 x 2 = b 


we can express it in vector form as 


a r x = b 
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where 



a\ 

and x = 

V 

a = 


_a 2 _ 


AX 


The following theorem gives the necessary projection formula (Exercise 5). 


Orthogonal Projection Formula 

Let L be a line in R 2 with equation a r x = b, and let x* be any point in R 2 ( Figure 
10.11.10 ). Then the orthogonal projection, x p , of x* onto L is given by 

* , (b — a r x*) 

x„ = x H a 

a 2 a 


► EXAMPLE 1 Using Algorithm 1 

We can use Algorithm 1 to find an approximate solution of the linear system given in (5) 
and illustrated in Figure 10.1 1.9. If we write the equations of the three lines as 

L\. af x = b\ 

L 2 : vfx = b 2 
Ly. aj x = i>3 


Table 1 



*1 

*2 

X 0 

1.00000 

3.00000 

X® 

.00000 

2.00000 

v (!) 

4 

.40000 

1.20000 

x<» 

1.30000 

.90000 

x< 2 > 

1.20000 

.80000 

x< 2 > 

.88000 

1.44000 

x< 2 > 

1.42000 

1.26000 

x< 3 > 

1.08000 

.92000 

x< 2 3 ) 

.83200 

1.41600 

X< 3 3 ) 

1.40800 

1.22400 

x< 4 > 

1.09200 

.90800 

x (4) 

.83680 

1.41840 

x< 4 > 

1.40920 

1.22760 

x< 5 > 

1.09080 

.90920 

x< 2 5 > 

.83632 

1.41816 

x< 5 > 

1.40908 

1.22724 

xf 

1.09092 

.90908 

y (6) 

A 2 

.83637 

1.41818 

X® 

1.40909 

1.22728 


where 



'x{ 


T 


T 


3' 

X = 

X2_ 

, a ! = 

_ 1 _ 

. a 2 = 

_-2 

. a 3 = 

-1 


b x =2, b, = -2, b 3 = 3 

then, using Theorem 10.11.1, we can express the iteration scheme in Algorithm 1 as 


(p) ( p ) , (b k a I x k- 1 ) 

XT = x^j + 


a[a* 


a k, 


k= 1.2,3 


where p = 1 for the first cycle of iterates, p — 2 for the second cycle of iterates, and so 
forth. After each cycle of iterates (i.e., after X3 is computed), the next cycle of iterates 
is begun with Xo /J+1) set equal to x 2 . 

Table 1 gives the numerical results of six cycles of iterations starting with the initial 
point x 0 = (1, 3). 

Using certain techniques that are impractical for large linear systems, we can show 
the exact values of the points of the limit cycle in this example to be 

x* = (if. if) = (1-09090..., .90909...) 
x* = (g, g) = (.83636 . . . , 1.41818 . . .) 
x* = (|, |) = (1.40909..., 1.22727...) 


It can be seen that the sixth cycle of iterates provides an excellent approximation to the 
limit cycle. Any one of the three iterates x[ 6) , x 2 \ xf } can be used as an approximate 
solution of the linear system. (The large discrepancies in the values of x ( , 6) , xf\ and Xj 6) 
are due to the artificial nature of this illustrative example. In practical problems, these 
discrepancies would be much smaller.) M 
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To generalize Algorithm 1 so that it applies to an overdetermined system of M 
equations in N unknowns. 


UnXi T a 1 2-^2 H — • + G\p]Xn — b\ 

ai\X\ + CI22X2 + • • ■ + G2N X N = i >2 


( 6 ) 


a Ml x \ + Om 2 x 2 + • • • + ClMN x N — bj[ 


we introduce column vectors x and a,- as follows: 



X\ 


an 

X = 

x 2 

, a , = 

an 


X N 


OiN 


With these vectors, the M equations constituting our linear system (6) can be written in 
vector form as 

a Jx — bj, i = 1 , 2, . . . , M 

Each of these M equations defines what is called a hyperplane in the /V-dimensional 
Euclidean space R N . I 11 general these M hyperplanes have 110 common intersection, and 
so we seek instead some point in R N that is reasonably “close” to all of them. Such a 
point will constitute an approximate solution of the linear system, and its N entries will 
determine approximate pixel densities with which to form the desired cross section. 

As in the two-dimensional case, we will introduce an iterative process that generates 
cycles of successive orthogonal projections onto the M hyperplanes beginning with some 
arbitrary initial point in R N . Our notation for these successive iterates is 

( P ) / the iterate lying on the kth hyperplane \ 

k \ generated during the pth cycle of iterations I 


The algorithm is as follows: 


Algorithm 2 

Step 0. Choose any point in R N and label it x 0 . 
Step 1. For the first cycle of iterates, set p = 1. 
Step 2. For k = 1, 2, . . . , M, compute 


At — At-i 


(b k - a^x'^j) 




-a k 


Step 3. Set Xq P+1) = . 

Step 4. Increase the cycle number p by 1 and return to Step 2. 

In Step 2 the iterate x k p) is called the orthogonal projection of x^, onto the hyperplane 
a^x = b k . Consequently, as in the two-dimensional case, this algorithm determines a 
sequence of orthogonal projections from one hyperplane onto the next in which we cycle 
back to the first hyperplane after each projection onto the last hyperplane. 

It can be shown that if the vectors a j , aj , . . . , a M span R N , then the iterates x ^ , x^ , 
Xm , . . . lying on the Mth hyperplane will converge to a point x^ on that hyperplane 
which does not depend on the choice of the initial point x 0 . In computed tomography, 
one of the iterates x^ ] for p sufficiently large is taken as an approximate solution of the 
linear system for the pixel densities. 
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Note that for the center-of-pixel method, the scalar quantity aja,t appearing in the 
equation in Step 2 of the algorithm is simply the number of pixels in which the kih beam 
passes through the center. Similarly, note that the scalar quantity 

b k ~ 

in that same equation can be interpreted as the excess kth beam density that results 
if the pixel densities are set equal to the entries of . This provides the following 
interpretation of our ART iteration scheme for the center-of-pixel method: Generate the 
pixel densities of each iterate by distributing the excess beam density of successive beams 
in the scan evenly among those pixels in which the beam passes through the center. When 
the last beam in the scan has been reached, return to the first beam and continue. 


► EXAMPLE 2 Using Algorithm 2 

We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the 
3x3 array illustrated in Figure 10.11.11. These 9 pixels are scanned using the parallel 
mode with 12 beams whose measured beam densities are indicated in the figure. We 
choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 
8, you are asked to set up the beam equations using the center line and area methods.) 
As you can verify, the beam equations are 


Xg + x% -j- Xg = 13.00 
X 4 T X 5 + X(, — 1 5.00 
A'l + X2 + X 3 = 8.00 
xe + X8 + xg — 14.79 
*3 + X 5 + %i = 14.31 
x 1 T X2 -f- X4 — 3.81 


x 3 

+ 

x 6 

+ 

Xg = 

= 18.00 

*2 

+ 

X 5 

+ 

x% = 

= 12.00 

Xl 

+ 

X 4 

+ 

Xl = 

= 6.00 

X 2 

+ 

X 3 

+ 

X(, = 

= 10.51 

Xl 

+ 

X 5 

+ 

Xg - 

= 16.13 

X 4 

+ 

Xl 

+ 

Xg = 

= 7.04 


Table 2 illustrates the results of the iteration scheme starting with an initial iterate x 0 = 0. 
The table gives the values of each of the first cycle of iterates, x, 1 ' through xJV , but 
thereafter gives the iterates x \ p 2 only for various values of p. The iterates x \ p 2 start 
repeating to two decimal places for p > 45, and so we take the entries of x^ as approx- 
imate values of the 9 pixel densities. 


63 = 8.00 

b 2 = 15.00 
b x = 13.00 


▲ Figure 10.11.11 


*6 = 3.81 





b w = 10.51 



We close this section by noting that the field of computed tomography is presently a 
very active research area. In fact, the ART scheme discussed here has been replaced in 
commercial systems by more sophisticated techniques that are faster and provide a more 
accurate view of the cross section. However, all the new techniques address the same basic 
mathematical problem: finding a good approximate solution of a large overdetermined 
inconsistent linear system of equations. 
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Table 2 



Pixel Densities 


*1 

*2 

*3 

*4 

*5 

*6 

*7 

*8 

*9 

x o 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

x« 

.00 

.00 

.00 

.00 

.00 

.00 

4.33 

4.33 

4.33 

Y (l) 

X 2 

.00 

.00 

.00 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

V (D 

X 3 

2.67 

2.67 

2.67 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

V (D 

X 4 

2.67 

2.67 

2.67 

5.00 

5.00 

5.37 

4.33 

4.71 

4.71 

d" 

2.67 

2.67 

3.44 

5.00 

5.77 

5.37 

5.10 

4.71 

4.71 

4" 

.49 

.49 

3.44 

2.83 

5.77 

5.37 

5.10 

4.71 

4.71 

Y (l) 

X 7 

.49 

.49 

4.93 

2.83 

5.77 

6.87 

5.10 

4.71 

6.20 

Y d) 

X 8 

.49 

.84 

4.93 

2.83 

6.11 

6.87 

5.10 

5.05 

6.20 

Y (l) 

X 9 

-.31 

.84 

4.93 

2.02 

6.11 

6.87 

4.30 

5.05 

6.20 

Y d) 

X 10 

-.31 

.13 

4.22 

2.02 

6.11 

6.16 

4.30 

5.05 

6.20 

x d) 

x n 

1.06 

.13 

4.22 

2.02 

7.49 

6.16 

4.30 

5.05 

7.58 

vd) 

x 12 

1.06 

.13 

4.22 

.58 

7.49 

6.16 

2.85 

3.61 

7.58 

Y (2) 

X 12 

2.03 

.69 

4.42 

1.34 

7.49 

5.39 

2.65 

3.04 

6.61 

y (3) 

X 12 

1.78 

.51 

4.52 

1.26 

7.49 

5.48 

2.56 

3.22 

6.86 

x (4) 

A 12 

1.82 

.52 

4.62 

1.37 

7.49 

5.37 

2.45 

3.22 

6.82 

vd) 

A 12 

1.79 

.49 

4.71 

1.43 

7.49 

5.31 

2.37 

3.25 

6.85 

v a °) 

A 12 

1.68 

.44 

5.03 

1.70 

7.49 

5.03 

2.04 

3.29 

6.96 

y (20) 

A 12 

1.49 

.48 

5.29 

2.00 

7.49 

4.73 

1.79 

3.25 

7.15 

y (30) 

a 12 

1.38 

.55 

5.34 

2.11 

7.49 

4.62 

1.74 

3.19 

7.26 

v (4 °) 

A 12 

1.33 

.59 

5.33 

2.14 

7.49 

4.59 

1.75 

3.15 

7.31 

y (45) 

A 12 

1.32 

.60 

5.32 

2.15 

7.49 

4.59 

1.76 

3.14 

7.32 


Exercise Set 10.11 


1. (a) Setting xjr = 
equations 


x ( k 2 ), show that the three projection 


Jp) 


- x (p) + 

— x t-i + 


C h - a [x^\) 

a[ar 


a*, A: =1,2, 3 


for the three lines in Equation (5) can be written as 


k = 1 : 


k = 2 : 


k = 3: 


y (p> _ in _L v (p> _ v <Ph 
*11 — , L- t *oi -'-02 J 


= i[2-4f , + 4f] 


,.(p) 


,.(p) 


^(p> 


4 P) ] 


= | [—2 + 4x{ p) + 2x 


— | [4 + + Xjf ] 

*<f , = i 5 [9 + *<f ) + 3*2 ) ] 


10 L- ' • A, 21 

— TTf[ — 3 + 3*21 


Ap> — Tr__'i j_ -iUp) _|_ 9x^'] 


where (-Xof +1) , Xof +1> ) = i x yi i * 32 *) for p = 1 , 2 ,.... 


(b) Show that the three pairs of equations in part (a) can be 
combined to produce 

*3? = 2d 28 + dr" - * 3 ( r 1) ] _ j 2 

4i= 2 5[24+34r n -3x‘r 1) ] p ~ 

where (x^ 1 ,*^) = (x^x^) = Xq 1 ’. [Note: Using this 
pair of equations, we can perform one complete cycle of 
three orthogonal projections in a single step.] 

(c) Because x 2 p> tends to the limit point x| as p — > oo, the equa- 
tions in part (b) become 

*31 = 20 i 28 3“ *3i — * 32 ] 

*32 = 20 i 2 ^ 3“ 3*31 — 3 -Xj 2 ] 

as p — > 00 . Solve this linear system for x 2 = (xjj, xj,). 
[Note: The simplifications of the ART formulas described 
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in this exercise are impractical for the large linear systems 
that arise in realistic computed tomography problems.] 

2. Use the result of Exercise 1(b) to find Xj 1 *, x®, .... X 3 61 to five 
decimal places in Example 1 using the following initial points: 

(a) x 0 = ( 0 , 0 ) (b) x 0 = ( 1 , 1 ) 

(c) x 0 = (148, —15) 


3. (a) Show directly that the points of the limit cycle in Example 1, 

v* - (11 11 ) v* - (!« 11 ) V* - (11 11 ) 

form a triangle whose vertices lie on the lines L lf L 2 , and 
L 3 and whose sides are perpendicular to these lines (Fig- 
ure 10.11.9c). 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathemcitica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. Given the set of equations 


(b) Using the equations derived in Exercise 1(a), show that if 
x«=x* = (§,§), then 

*? = * = (%• 


/ 12 10 \ 
111’ 11 / 


* _ /46 

X, — X, — ^ 5J , 


„( 1 ) 


~ x * - (31 ?2) 

3 V22’ 22/ 


[Note: Either part of this exercise shows that successive 
orthogonal projections of any point on the limit cycle will 
move around the limit cycle indefinitely.] 


4. The following three lines in the jc 1 * 2 -plane, 

L\. x 2 = 1 

L 2 : X\ — x 2 = 2 

L 2 : x\ — x 2 = 0 

do not have a common intersection. Draw an accurate sketch of 
the three lines and graphically perform several cycles of the or- 
thogonal projections described in Algorithm 1 , beginning with 
the initial point x 0 = (0, 0). On the basis of your sketch, deter- 
mine the three points of the limit cycle. 


5. Prove Theorem 10.11.1 by verifying that 

(a) the point x p as defined in the theorem lies on the line 
a r x = b (i.e., a T x p = b). 

(b) the vector x p — x* is orthogonal to the line a r x = b (i.e., 
x p — x* is parallel to a). 

6. As stated in the text, the iterates x^ 1 , xj)}’ , x^’ , . . . defined in 
Algorithm 2 will converge to a unique limit point x* M if the vec- 
tors ai , a 2 , . . . , span R N . Show that if this is the case and 
if the center-of-pixel method is used, then the center of each of 
the N pixels in the field of view is crossed by at least one of the 
M beams in the scan. 


7. Construct the 12 beam equations in Example 2 using the center 
line method. Assume that the distance between the center lines 
of adjacent beams is equal to the width of a single pixel. 

8 . Construct the 12 beam equations in Example 2 using the area 
method. Assume that the width of each beam is equal to the 
width of a single pixel and that the distance between the center 
lines of adjacent beams is also equal to the width of a single 
pixel. 


a k x + b k y = c k 


for k = 1, 2, 3 n (with n > 2), let us consider the following 

algorithm for obtaining an approximate solution to the system. 

1 . Solve all possible pairs of equations 

cijX + biy = c; and a 2 x + bjy = Cj 

for i, j = 1, 2, 3, . . . , n and i < j for their unique solutions. 
This leads to 

\n(n — 1 ) 

solutions, which we label as 


(Xij, y ij) 

for i, j = 1, 2, 3, .... n and i < j . 

2. Construct the geometric center of these points defined by 


(x c , yc ) = 


n(n — 1 ) 


EE- 

(=i 7=1+1 


n(n — 1 ) 


EE: 

i=l 7=1+1 


and use this as the approximate solution to the original 
system. 

Use this algorithm to approximate the solution to the system 


x + y = 2 

x — 2y = —2 
3x — y = 3 

and compare your results to those in this section. 
T2. ( Calculus required) Given the set of equations 


a k x + b k y = c k 


for k = 1, 2, 3, . . . , n (with n > 2), let us consider the follow- 
ing least squares algorithm for obtaining an approximate solu- 
tion (x*, y*) to the system. Given a point (o', ft) and the line 
a t x + bjy = Ci, the distance from this point to the line is given by 

I a t a + btP - Cj | 

vE+E 
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If we define a function f{x , y) by 




E (diX + bj y-Ci ) 2 

af + bf 


and then determine the point ( x *, y*) that minimizes this func- 
tion, we will determine the point that is closest to each of these 
lines in a summed least squares sense. Show that x* and y* are 
solutions to the system 




** + 


E 


ciibi 

if + bf 


E 


af + bf 


and 


E 


a ‘ b ‘ E*+(£ b} 


a, } + b f 


\ i=l 1 ' / \i=l 

Apply this algorithm to the system 


af + bf 


/ = E 


btCi 

af + bf 


x + y = 2 

x — 2y = —2 
3x — y = 3 

and compare your results to those in this section. 


10.12 Fractals 

In this section we will use certain classes of linear transformations to describe and generate 
intricate sets in the Euclidean plane. These sets, called fractals, are currently the focus of 
much mathematical and scientific research. 


Geometry of Linear Operators on R 2 (Section 4.11) 

Euclidean Space R n 

Natural Logarithms 

Intuitive Understanding of Limits 


Fractals in the Euclidean At the end of the nineteenth century and the beginning of the twentieth century, various 
Plane bizarre and wild sets of points in the Euclidean plane began appearing in mathemat- 
ics. Although they were initially mathematical curiosities, these sets, called fractals, are 
rapidly growing in importance. It is now recognized that they reveal a regularity in phys- 
ical and biological phenomena previously dismissed as “random,” “noisy,” or “chaotic.” 
For example, fractals are all around us in the shapes of clouds, mountains, coastlines, 
trees, and ferns. 

In this section we give a brief description of certain types of fractals in the Euclidean 
plane R 2 . Much of this description is an outgrowth of the work of two mathematicians, 
Benoit B. Mandelbrot and Michael Barnsley, who are both active researchers in the field. 


Self-Similar Sets To begin our study of fractals, we need to introduce some terminology about sets in 
R 2 . We will call a set in R 2 bounded if it can be enclosed by a suitably large circle 
(Figure 10.12.1) and closed if it contains all of its boundary points (Figure 10.12.2). Two 
sets in R 2 will be called congruent if they can be made to coincide exactly by translating 
and rotating them appropriately within R 2 (Figure 10.12.3). We will also rely on your 
intuitive concept of overlapping and nonoverlapping sets, as illustrated in Figure 10.12.4. 

If T: R 2 -» R 2 is the linear operator that scales by a factor of s (see Table 7 of 
Section 4.9), and if Q is a set in R 2 , then the set T ( Q ) (the set of images of points in Q 
under T ) is called a dilation of the set Q if s > 1 and a contraction of Q if 0 < s < 1 
(Figure 10.12.5). In either case we say that T(Q) is the set Q scaled by the factor s. 
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(a) Overlapping sets 



( 6 ) Nonoverlapping sets 

▲ Figure 10.12.4 



► Figure 10.12.1 


y 


Unbounded set 


( 6 ) This set cannot be 
enclosed by any circle. 



0.12.2 The 

boundary points (solid 
color) lie in the set. 




The types of fractals we will consider first are called self-similar. In general, we define 
a self-similar set in R 2 as follows: 


DEFINITION 1 A closed and bounded subset of the Euclidean plane R 1 is said to be 
self-similar if it can be expressed in the form 

S = Si U S 2 u S 3 U ■ ■ • U S k (1) 

where Si , Si, S 3 , , Sk are nonoverlapping sets, each of which is congruent to S 

scaled by the same factor s (0 < s < 1). 


If S is a self-similar set, then (1) is sometimes called a decomposition of S into nonover- 
lapping congruent sets. 


EXAMPLE 1 Line Segment 

^ A line segment in R 2 (Figure 10. 12.6a) can be expressed as the union of two nonoverlap- 

ping congruent line segments (Figure 10.12.66). In Figure 10.12.66 we have separated 
the two line segments slightly so that they can be seen more easily. Each of these two 
smaller line segments is congruent to the original line segment scaled by a factor of 
▲ Figure 10.12,6 Hence, a line segment is a self-similar set with k = 2 and s = i. 
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(a) 



(b) 

A Figure 10.12.7 


► EXAMPLE 2 Square 

A square (Figure 10.12.7a) can be expressed as the union of four nonoverlapping con- 
gruent squares (Figure 10. 12.76), where we have again separated the smaller squares 
slightly. Each of the four smaller squares is congruent to the original square scaled by a 
factor of | . Hence, a square is a self-similar set with k — 4 and s = ^ ■ 

► EXAMPLE 3 Sierpinski Carpet 

The set suggested by Figure 10. 12. 8a, the Sierpinski “carpet,” was first described by the 
Polish mathematician Waclaw Sierpinski (1882-1969). It can be expressed as the union 
of eight nonoverlapping congruent subsets (Figure 10. 12.86), each of which is congruent 
to the original set scaled by a factor of | . Hence, it is a self-similar set with k = 8 and 
s = Note that the intricate square-within-a-square pattern continues forever on a 
smaller and smaller scale (although this can only be suggested in a figure such as the one 
shown). 



Rpdn 




i U i 

to 



■ 

ft* 


► Figure 10.12.8 («) 



► EXAMPLE 4 Sierpinski Triangle 

Figure 10.12.9a illustrates another set described by Sierpinski. It is a self-similar set 
with k = 3 and s = ^ (Figure 10.12.96). As with the Sierpinski carpet, the intricate 
triangle-within-a-triangle pattern continues forever on a smaller and smaller scale. 



The Sierpinski carpet and triangle have a more intricate structure than the line seg- 
ment and the square in that they exhibit a pattern that is repeated indefinitely. This 
difference will be explored later in this section. 


Topological Dimension of a In Section 4.5 we defined the dimension of a subspace of a vector space to be the number 

Set of vectors in a basis, and we found that definition to coincide with our intuitive sense of 
dimension. For example, the origin of R 2 is zero-dimensional, lines through the origin 
are one-dimensional, and R 2 itself is two-dimensional. This definition of dimension is a 
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special case of a more general concept called topological dimension , which is applicable 
to sets in R” that are not necessarily subspaces. A precise definition of this concept is 
studied in a branch of mathematics called topology. Although that definition is beyond 
the scope of this text, we can state informally that 
a point in R 2 has topological dimension zero; 
a curve in R 2 has topological dimension one; 
a region in R 2 has topological dimension two. 

It can be proved that the topological dimension of a set in R" must be an integer between 
0 and n, inclusive. In this text we will denote the topological dimension of a set S by 
d T (S). 


► EXAMPLE 5 Topological Dimensions of Sets 

Table 1 gives the topological dimensions of the sets studied in our earlier examples. 
The first two results in this table are intuitively obvious; however, the last two are not. 
Informally stated, the Sierpinski carpet and triangle both contain so many “holes” that 
those sets resemble web-like networks of lines rather than regions. Hence they have 
topological dimension one. The proofs are quite difficult. 


Set S 

d T (S) 

Line segment 

1 

Square 

2 

Sierpinski carpet 

1 

Sierpinski triangle 

1 


Hausdorff Dimension of a In 1919 the German mathematician Felix Hausdorff (1868-1942) gave an alternative 
Self-Similar Set definition for the dimension of an arbitrary set in R" . His definition is quite complicated, 
but for a self-similar set, it reduces to something rather simple: 


DEFINITION 2 The Hausdorff dimension of a self-similar set S of form (1) is denoted 
by dn(S) and is defined by 


d H (S) = 


In A: 

ln(l/.s) 


( 2 ) 


In this definition, “In” denotes the natural logarithm function. Equation (2) can also be 
expressed as 

s da(S) = - (3) 

k 

in which the Hausdorff dimension dn(S) appears as an exponent. Formula (3) is more 
helpful for interpreting the concept of Hausdorff dimension; it states, for example, that 
if you scale a self-similar set by a factor of s = ^ , then its area (or more properly its 

measure) decreases by a factor of (|) rf,,<S> . Thus, scaling a line segment by a factor of 
\ reduces its measure (length) by a factor of Q) 1 = and scaling a square region by a 

factor of | reduces its measure (area) by a factor of 

Before proceeding to some examples, we should note a few facts about the Hausdorff 
dimension of a set: 

The topological dimension and Hausdorff dimension of a set need not be the same. 
The Hausdorff dimension of a set need not be an integer. 

The topological dimension of a set is less than or equal to its Hausdorff dimension; 
that is, dj(S) < d H (S). 
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► EXAMPLE 6 Hausdorff Dimensions of Sets 

Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. 


Table 2 


Set 5 

S 

k 

In k 

ddS) ~ In (1/s) 

Line segment 

1 

2 

2 

In 2/ln 2 = 1 

Square 

1 

2 

4 

In 4/ln 2 = 2 

Sierpinski carpet 

1 

3 

8 

In 8/ln 3 = 1.892 . . . 

Sierpinski triangle 

1 

2 

3 

In 3/ln 2 = 1.584. . . 


Fractals Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are 
equal for both the line segment and square but are unequal for the Sierpinski carpet and 
triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological 
and Hausdorff dimensions differ must be quite complicated (as Hausdorff had earlier 
suggested in 1919). Mandelbrot proposed calling such sets fractals , and he offered the 
following definition. 


DEFINITION 3 A fractal is a subset of a Euclidean space whose Hausdorff dimension 
and topological dimension are not equal. 


According to this definition, the Sierpinski carpet and Sierpinski triangle are fractals, 
whereas the line segment and square are not. 

It follows from the preceding definition that a set whose Hausdorff dimension is not 
an integer must be a fractal (why?). However, we will see later that the converse is not 
true; that is, it is possible for a fractal to have an integer Hausdorff dimension. 

Similitudes We will now show how some techniques from linear algebra can be used to generate 
fractals. This linear algebra approach also leads to algorithms that can be exploited to 
draw fractals on a computer. We begin with a definition. 


DEFINITION 4 A similitude with scale factor s is a mapping of R 

into R 2 of the form 

( 

X 

\ 

"cos 6 — sind" 

X 


e 


( 

.y. 

) =S 

sind cos 6_ 

_y. 

+ 

j. 


where s,0,e, and / are scalars. 







Geometrically, a similitude is a composition of three simpler mappings: a scaling by 
a factor of s, a rotation about the origin through an angle 0, and a translation (e units 
in the x-direction and / units in the v-direction). Figure 10.12.10 illustrates the effect 
of a similitude on the unit square U. 

For our application to fractals, we will need only similitudes that are contractions, by 
which we mean that the scale factor s is restricted to the range 0 < s < 1 . Consequently, 
when we refer to similitudes we will always mean similitudes subject to this restriction. 
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( 0 , 1 ) 


( 1 , 1 ) 


► Figure 10.12.10 


(0,0) (i j0 ) 

(a) Unit square 



(b) Unit square 
after similitude 


Similitudes are important in the study of fractals because of the following fact: 


IfT -.R 1 -* R 2 is a similitude with scale factor s and if S is a closed and bounded set in 
R 2 , then the image T (5) of the set S under T is congruent to S scaled by s. 


(0,1) 


( 1 , 1 ) 


u 


( 0 , 0 ) 


(a) 


X 

(1,0) 


y 


(04) 


( 0 , 0 ) 


ryu) t 2 (U) 


rys) | t 2 (S) \ 

(±,o) d, 0 ) 

( b ) 

A Figure 10.12.11 


Recall from the definition of a self-similar set in R 2 that a closed and bounded set S in 
R 2 is self-similar if it can be expressed in the form 

S = Si U S 2 U S 3 U ■ ■ • U S k 

where Si, S 2 , S 3 , . . . , Sk are nonoverlapping sets each of which is congruent to S scaled 
by the same factor 5 (0 < s < 1) [see (1)]. In the following examples, we will find 
similitudes that produce the sets Si, S 2 , S 3 , . . . , Sk from S for the line segment, square, 
Sierpinski carpet, and Sierpinski triangle. 


I EXAMPLE 7 Line Segment 


We will take as our line segment the line segment S connecting the points (0, 0) and 
(1, 0) in the xy-plane (Figure 10.12.1 la). Consider the two similitudes 


Ti 


( 

X 

\ _ 1 

T 

O' 

X 

\ 

_y_ 

) 2 

_0 

1_ 

_y_ 

( 

X 

\ 1 

T 

O' 

X 

V 

_y_ 

) 2 

_0 

1_ 

_y_ 


( 4 ) 


both of which have .v = 2 and 0 = 0. In Figure 10.12.1 lb we show how these two 
similitudes map the unit square U. The similitude T\ maps U onto the smaller square 
T\(JJ), and the similitude T 2 maps U onto the smaller square T 2 (U). At the same 
time, T\ maps the line segment S onto the smaller line segment T\(S), and T 2 maps S 
onto the smaller nonoverlapping line segment T 2 (S). The union of these two smaller 
nonoverlapping line segments is precisely the original line segment S; that is, 


S= Ti(S) U T 2 (S) 


( 5 ) 


► EXAMPLE 8 Square 

Let us consider the unit square U in the xy-plane (Figure 10.12.12a) and the following 
four similitudes, all having s = \ and 0 = 0: 


T\ 


T 3 


X 

\ 1 

'1 

o' 

X 


( 

X 


= 

_0 




t 2 


m y. 

) 2 

1_ 

_y_ 


“V 

_y_ 

~x~ 


'1 

O' 

X 


" 0 " 

/ 

X 


) = T 

_0 

1_ 


+ 

1 

n 


_y_ 

J 2 

_y_ 


1 

2 

_y_ 


x 

,.VJ 

x 

.y 


i (6) 
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( 0 , 1 ) 


u 


(0, 0) | 

(a) 


( 1 , 1 ) 


■ * 

( 1 , 0 ) 


(0,1) 

it' 


(1,1) 



K) 

T,(U) 

HU) 


HU) 

HU) 

X 

(0, 0) 

a 

.0) (1,0) 


(b) 

▲ Figure 10.12.12 


The images of the unit square U under these four similitudes are the four squares shown 
in Figure 10. 12.126. Thus, 

U = T l (U)UT 2 (U)UT 3 (U)UT 4 (U) (7) 

is a decomposition of U into four nonoverlapping squares that are congruent to U scaled 
by the same scale factor (s = 2). 


> EXAMPLE 9 Sierpinski Carpet 


Let us consider a Sierpinski carpet S over the unit square U of the xy-plane (Figure 
10.12.13a) and the following eight similitudes, all having s = | and 0 = 0: 


T t 


( 

X 

\ i 

'1 O' 

X 


e-, 




.0 i. 


+ 

1 

* ^ 

\ 

_y_ 

) ~ 3 

_y_ 



i= 1,2, 3,..., 8 


( 8 ) 


where the eight values of 


e, 

Ji. 


are 


0 


1 

3 


2 

3 


0 


2 

3 


0 


1 

3 


2 

3 

0 

’ 

0 

’ 

0 

’ 

1 

’ 

1 

’ 

2 

’ 

2 

’ 

2 




3 


3 


3 


3 


_ 3 _ 


The images of S under these eight similitudes are the eight sets shown in Figure 10.12.1 3 b. 
Thus, 

5 = 7i(S) U T 2 (S) U T 3 (S) U • • • U T s (S) (9) 

is a decomposition of S into eight nonoverlapping sets that are congruent to S scaled by 
the same scale factor (.y = |). 


► Figure 10.12.13 



EXAMPLE 10 Sierpinski Triangle 


Let us consider a Sierpinski triangle S fitted inside the unit square U of the xy-plane, 
as shown in Figure 10.12.14a, and the following three similitudes, all having s = | and 


0 = 0 : 


Ti 



T 2 



l 

T 

O' 

X 

2 

_0 

1_ 

_y_ 

1 

T 

O' 

X 

2 

_0 

1_ 
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(10) 
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The images of 5 under these three similitudes are the three sets in Figure 10.12.146. 
Thus, 

s = ri(S)ur 2 (S)ur 3 (S) (ii) 

is a decomposition of 5 into three nonoverlapping sets that are congruent to 5 scaled by 
the same scale factor (s = '). 


► Figure 10.12.14 




In the preceding examples we started with a specific set 5 and showed that it was 
self-similar by finding similitudes T\ , T 2 , T 3 , . . , , 7) with the same scale factor such that 
7) (5), T 2 (S), T^(S), . . . , 71(5) were nonoverlapping sets and such that 

5 = 71(5) U 71(5) U 71(5) U ■ • • U 71(5) (12) 

The following theorem addresses the converse problem of determining a self-similar set 
from a collection of similitudes. 


IfT\, 71, 71, , 71 are contracting similitudes with the same scale 
factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane 
such that 

5 = 71(5) U 71(5) U 71(5) U ■ • • U 71(5) 

Furthermore, if the sets 71(5), 71(5), 71(5), . . . , 71(5) are nonoverlapping, then S is 
self-similar. 


Algorithms for Generating In general, there is no simple way to obtain the set 5 in the preceding theorem directly. 

Fractals We now describe an iterative procedure that will determine 5 from the similitudes that 
define it. We first give an example of the procedure and then give an algorithm for the 
general case. 

► EXAMPLE 11 Sierpinski Carpet 

Figure 10.12.15 shows the unit square region 5 0 in the xy-plane, which will serve as 
an “initial” set for an iterative procedure for the construction of the Sierpinski carpet. 
The set 5i in the figure is the result of mapping 5o with each of the eight similitudes 7} 
(i = 1 , 2, . . . , 8) in (8) that determine the Sierpinski carpet. It consists of eight square 
regions, each of side length | , surrounding an empty middle square. Next we apply the 
eight similitudes to 5i and arrive at the set S 2 . Similarly, applying the eight similitudes 
to S 2 results in the set S 2 . It we continue this process indefinitely, the sequence of sets 
5i, 52, 53, . . . will “converge” to a set 5, which is the Sierpinski carpet. M 
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► Figure 10.12.15 




Remark Although we should properly give a definition of what it means for a sequence of sets 
to “converge” to a given set, an intuitive interpretation will suffice in this introductory treatment. 

Although we started in Figure 10.12.15 with the unit square region to arrive at the 
Sierpinski carpet, we could have started with any nonempty set So- The only restriction 
is that the set S 0 be closed and bounded. For example, if we start with the particular 
set S 0 shown in Figure 10.12.16, then Si is the set obtained by applying each of the eight 
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► Figure 10.12.16 
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similitudes in (8). Applying the eight similitudes to Si results in the set S 2 . As before, 
applying the eight similitudes indefinitely yields the Sierpinski carpet S as the limiting 
set. 

The general algorithm illustrated in the preceding example is as follows: Let T\, T 2 , 
T 3 ,...,T k be contracting similitudes with the same scale factor, and for an arbitrary 
set 0 in R 2 , define the set 3(0) by 


3(0) = Ti(Q) U 73(0) U 73(0) U • • ■ U T k (Q) 


The following algorithm generates a sequence of sets So, S\, . . . , S n , . . . that converges 
to the set S in Theorem 10.12.1. 

Algorithm 1 

Step 0. Choose an arbitrary nonempty closed and bounded set So in R 2 . 

Step 1. Compute S) = 3(So)- 
Step 2. Compute ,33 = 3(-S'i). 

Step 3. Compute S 3 = 3 (.S'?). 

Step n. Compute S„ = 3 ( S „ _ i ) . 


!> EXAMPLE 12 Sierpinski Triangle 

Let us construct the Sierpinski triangle determined by the three similitudes given in (10). 
The corresponding set mapping is 3(0) = 73(0) U 73(0) U 73(0). Figure 10.12.17 
shows an arbitrary closed and bounded set S 0 ; the first four iterates Si, S 2 , S 3 , S 4 ; and 
the limiting set S (the Sierpinski triangle). 
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► Figure 10.12.18 


► Figure 10.12.19 
A Monte Carlo Approach 


> EXAMPLE 13 Using Algorithm 1 


Consider the following two similitudes: 



1 

'1 

O' 



2 

_0 

1 



1 

"cos0 

— sin 9 ' 

X 

2 

_sin0 

cos 9_ 

J. 


.3' 

.3 


The actions of these two similitudes on the unit square U are illustrated in Figure 10.12.18. 
Here, the rotation angle 9 is a parameter that we will vary to generate different self- 
similar sets. The self-similar sets determined by these two similitudes are shown in 
Figure 10.12.19 for various values of 9. For simplicity, we have not drawn the xy-axes, 
but in each case the origin is the lower left point of the set. These sets were generated on 
a computer using Algorithm 1 for the various values of 9. Because k = 2 and s = it 
follows from (2) that the Hausdorff dimension of these sets for any value of 6 is 1. It can 
be shown that the topological dimension of these sets is 1 for 0 = 0 and 0 for all other 
values of 9. It follows that the self-similar set for 9 = 0 is not a fractal [it is the straight 
line segment from (0, 0) to (.6, .6)], while the self-similar sets for all other values of 9 are 
fractals. In particular, they are examples of fractals with integer Hausdorff dimension. 
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The set-mapping approach of constructing self-similar sets described in Algorithm 1 is 
rather time-consuming on a computer because the similitudes involved must be applied 
to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael 
Barnsley described an alternative, more practical method of generating a self-similar set 
defined through its similitudes. It is a so-called Monte Carlo method that takes advantage 
of probability theory. Barnsley refers to it as the Random Iteration Algorithm. 

Let 7i, T 2 , T 3 , . . . , Tk be contracting similitudes with the same scale factor. The 
following algorithm generates a sequence of points 


'xo 


'x{ 


~X n ~ 

_yo_ 


.yi. 


_yn_ 


that collectively converge to the set S in Theorem 10.12.1. 
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5000 iterations 
▲ Figure 10.12.20 


More General Fractals 


Algorithm 2 

Step 0. Choose an arbitrary point 


Xo 

LJoJ 


in S. 


Step 1. Choose one of the k similitudes at random, say T kl , and compute 


Xl 

lyi. 


= T k] 


Step 2. Choose one of the k similitudes at random, say T k2 , and compute 


X'L 

L yi. 


= T kl 


Step n. Choose one of the k similitudes at random, say T kn , and compute 


x, 

,y n j 


= T kn 


X n 1 

,y n - tj 


On a computer screen the pixels corresponding to the points generated by this algorithm 
will fill out the pixel representation of the limiting set S. 

Figure 10.12.20 shows four stages of the Random Iteration Algorithm that generate 

ro' 

the Sierpinski carpet, starting with the initial point . 



15,000 iterations 


45,000 iterations 


100,000 iterations 


Remark Although Step 0 in the preceding algorithm requires the selection of an initial point in 
the set S, which may not be known in advance, this is not a serious problem. In practice, one can 
usually start with any point in R 2 and after a few iterations (say ten or so), the point generated 
will be sufficiently close to S that the algorithm will work correctly from that point on. 

So far, we have discussed fractals that are self-similar sets according to the definition 
of a self-similar set in R 2 . However, Theorem 10.12.1 remains true if the similitudes 
Ti,T 2 , T k are replaced by more general transformations, called contracting affine 
transformations. An affine transformation is defined as follows: 


DEFINITION 5 An affine transformation is a mapping of R 2 into R 2 of the form 


( 

X 

\ 

a b 

X 


e 

T { 

_y_ 

)= 

1 

i 

_y_ 

+ 

j. 


where a, b, c, d, e, and / are scalars. 
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(0,l)o *( 1 , 1 ) 

u 


( 0 , 0 ) ( 1 , 0 ) 
(a) Unit square 



( b ) Unit square after 
affine transformation 

▲ Figure 10.12.21 


► Figure 10.12.22 


Figure 10.12.21 shows how an affine transformation maps the unit square U onto a 
parallelogram T (U). An affine transformation is said to be contracting if the Euclidean 
distance between any two points in the plane is strictly decreased after the two points 
are mapped by the transformation. It can be shown that any k contracting affine trans- 
formations 7) , T 2 , . . . , Tk determine a unique closed and bounded set S satisfying the 
equation 

S = 7 1(5) U T 2 (S) U r 3 (5) U ■ • • U T k (S) (13) 

Equation (13) has the same form as Equation (12), which we used to find self-similar 
sets. Although Equation (13), which uses contracting affine transformations, does not 
determine a self-similar set S, the set it does determine has many of the features of self- 
similar sets. For example, Figure 10. 12.22 shows how a set in the plane resembling a fern 
(an example made famous by Barnsley) can be generated through four contracting affine 
transformations. Note that the middle fern is the slightly overlapping union of the four 
smaller affine-image ferns surrounding it. Note also how 7), because the determinant of 
its matrix part is zero, maps the entire fern onto the small straight line segment between 
the points (.50, 0) and (.50, .16). Figure 10.12.22 contains a wealth of information and 
should be studied carefully. 


(.115,1.030) (.965..990) 



( 0 , 1 ) ( 1 , 1 ) 



( 0 , 0 ) ( 1 , 0 ) 



(.575, -.086) 


Michael Barnsley has applied the above theory to the field of data compression and 
transmission. The fern, for example, is completely determined by the four affine trans- 
formations T\, 73, 7), 7). These four transformations, in turn, are determined by the 24 




10.12 Fractals 637 

numbers given in Figure 10.12.22 defining their corresponding values of a, b, c, d, e, and 
/. In other words, these 24 numbers completely encode the picture of the fern. Storing 
these 24 numbers in a computer requires considerably less memory space than storing a 
pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map 
on a computer screen can be described through a finite number of affine transformations, 
although it is not easy to determine which transformations to use. Nevertheless, once 
encoded, the affine transformations generally require several orders of magnitude less 
computer memory than a pixel-by-pixel description of the pixel map. 


Further 

Readings 


Readers interested in learning more about fractals are referred to the following books, the first of 
which elaborates on the linear transformation approach of this section. 

1. MICHAEL BARNSLEY, Fractals Everywhere (New York: Academic Press, 1993). 

2. BENOIT B. MANDELBROT, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 

3. HEINZ-OTTO PEITGEN AND P. H. RICHTER, The Beauty of Fractals (New York: Springer- Verlag, 
1986). 

4. HEINZ-OTTO PEITGEN AND DIETMAR SAUPE, The Science of Fractal Images (New York: Springer- 
Verlag, 2011). 


Exercise Set 10.12 


1. The self-similar set in Figure Ex-1 has the sizes indicated. 
Given that its lower left corner is situated at the origin of the 
xy-plane, find the similitudes that determine the set. What is 
its Hausdorff dimension? Is it a fractal? 



2. Find the Hausdorff dimension of the self-similar set shown 
in Figure Ex-2. Use a ruler to measure the figure and deter- 
mine an approximate value of the scale factor s. What are the 
rotation angles of the similitudes determining this set? 



3. Each of the 12 self-similar sets in Figure Ex-3 results from 
three similitudes with scale factor of \ , and so all have Haus- 
dorff dimension In 3/ In 2 = 1.584 .... The rotation angles of 
the three similitudes are all multiples of 90°. Find these rota- 
tion angles for each set and express them as a triplet of integers 
(«i, /i 2 , 713 ), where i;,- is the corresponding integer multiple of 
90° in the order upper right, lower left, lower right. For exam- 
ple, the first set (the Sierpinski triangle) generates the triplet 
( 0 , 0 , 0 ). 



▲ Figure Ex-3 
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4. For each of the self-similar sets in Figure Ex-4, find: (i) the 
scale factor s of the similitudes describing the set; (ii) the rota- 
tion angles 9 of all similitudes describing the set (all rotation 
angles are multiples of 90°); and (iii) the Hausdorff dimension 
of the set. Which of the sets are fractals and why? 





5. Show that of the four affine transformations shown in Fig- 
ure 10.12.22, only the transformation 73 is a similitude. De- 
termine its scale factor s and rotation angle 9. 


express the unit square as the union of four overlapping 
squares. Evaluate the right-hand side of Equation (2) for the 
values of k and s determined by these similitudes, and show 
that the result is not the correct value of the Flausdorff di- 
mension of the unit square. [Note: This exercise shows the 
necessity of the nonoverlapping condition in the definition of 
a self-similar set and its Hausdorff dimension.] 

9. All of the results in this section can be extended to R" . Com- 
pute the Hausdorff dimension of the unit cube in R 3 (see Fig- 
ure Ex-9). Given that the topological dimension of the unit 
cube is 3, determine whether it is a fractal. [Hint: Express the 
unit cube as the union of eight smaller congruent nonoverlap- 
ping cubes.] 



◄ Figure Ex-9 


10. The set in R 3 in Figure Ex-10 is called the Menger sponge. 
It is a self-similar set obtained by drilling out certain square 
holes from the unit cube. Note that each face of the Menger 
sponge is a Sierpinski carpet and that the holes in the Sierpin- 
ski carpet now run all the way through the Menger sponge. 
Determine the values of k and s for the Menger sponge and 
find its Hausdorff dimension. Is the Menger sponge a fractal? 


6. Find the coordinates of the tip of the fern in Figure 10.12.22. 
[Hint: The transformation 73 maps the tip of the fern to itself] 


7. The square in Figure 10. 12.7a was expressed as the union of 4 
nonoverlapping squares as in Figure 10.12.76. Suppose that it 
is expressed instead as the union of 1 6 nonoverlapping sq uares. 
Verify that its Hausdorff dimension is still 2, as determined by 
Equation (2). 


8. Show that the four similitudes 
T\ 
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▲ Figure Ex-10 
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11. The two similitudes 
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determine a fractal known as the Cantor set. Starting with the 
unit square region U as an initial set, sketch the first four sets 
that Algorithm 1 determines. Also, find the Hausdorff dimen- 
sion of the Cantor set. (This famous set was the first example 
that Hausdorff gave in his 1 9 1 9 paper of a set whose Hausdorff 
dimension is not equal to its topological dimension.) 


20 

S = IJ 7,(5) 

i - i 


for appropriately chosen similitudes 7) (for i — 1, 2, 3 20). 

Determine these similitudes by determining the collection of 3 x 1 
matrices 


bt 


fori = 1,2, 3,..., 20 


T2. Generalize the ideas involved in the Cantor set (in R l ), the 
Sierpinski carpet (in R 2 ), and the Menger sponge (in R 3 ) to R " by 
considering the set S satisfying 


12. Compute the areas of the sets S 0 , Si, S 2 , S 3 , and S 4 in Fig- 
ure 10.12.15. 


m n 

s = lJr,(S) 

i=i 


Working withTechnology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 
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where each a ki equals 0, |, or |, and no two of them ever equal | 
at the same time. Use a computer to construct the set 

r«i,i 


Tl. Use similitudes of the form 
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to show that the Menger sponge (see Exercise 10) is the set S sat- 
isfying 


an 

an 


for i = 1, 2, 3, ... , m„ 


La„iJ 

thereby determining the value of m„ for n = 2, 3, 4. Then develop 
an expression for m n . 


10.13 Chaos 

In this section we use a map of the unit square in the xy-plane onto itself to describe the 
concept of a chaotic mapping. 

Geometry of Linear Operators on R 2 (Section 4. 1 1) 

Eigenvalues and Eigenvectors 

Intuitive Understanding of Limits and Continuity 

Chaos The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and 
James Yorke in a paper entitled “Period Three Implies Chaos.’’ The term is now used to 
describe the behavior of certain mathematical mappings and physical phenomena that 
at first glance seem to behave in a random or disorderly fashion but actually have an 
underlying element of order (examples include random-number generation, shuffling 
cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of Jupiter, 
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and deviations in the orbit of Pluto). In this section we discuss a particular chaotic 
mapping called Arnold’s cat map , after the Russian mathematician Vladimir I. Arnold 
who first described it using a diagram of a cat. 


Arnold's Cat Map To describe Arnold’s cat map, we need a few ideas about modular arithmetic. If x is a 
real number, then the notation x mod 1 denotes the unique number in the interval [0, 1) 
that differs from x by an integer. For example, 

2.3 mod 1 = 0.3, 0.9 mod 1 = 0.9, —3.7 mod 1 = 0.3, 2.0 mod 1=0 

Note that if x is a nonnegative number, then x mod 1 is simply the fractional part of 
x. If (x, y) is an ordered pair of real numbers, then the notation (x, y) mod 1 denotes 
(x mod 1, y mod 1). For example, 

(2.3, -7.9) mod 1 = (0.3, 0.1) 

Observe that for every real number x, the point x mod 1 lies in the unit interval [0, 1) 
and that for every ordered pair (x, y), the point (x, y) mod 1 lies in the unit square 

5 = {(x, y) | 0 < x < 1,0 < y < 1} 

Also observe that the upper boundary and the right-hand boundary of the square are 
not included in S. 

Arnold’s cat map is the transformation T : R 2 -> R 2 defined by the formula 


F: (x, y) -» (x + y, x + 2 y) mod 1 


or, in matrix notation, 
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To understand the geometry of Arnold’s cat map, it is helpful to write (1) in the factored 
form 
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mod 1 


which expresses Arnold’s cat map as the composition of a shear in the x -direction with 
factor 1, followed by a shear in the y-direction with factor 1. Because the computations 
are performed mod 1, T maps all points of R 2 into the unit square S. 

We will illustrate the effect of Arnold’s cat map on the unit square S, which is shaded 
in Figure 10.13.1a and contains a picture of a cat. It can be shown that it does not matter 
whether the mod 1 computations are carried out after each shear or at the very end. We 
will discuss both methods, first performing them at the end. The steps are as follows: 
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▲ Figure 10.13.1 


(b) 
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Step 3: 
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c, y ) mod 1 
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Step 1. Shear in the x-direction with factor 

(x, y) 

or in matrix notation 

'1 r 
.0 !. 

Step 2. Shear in the y-direction with factor 

(x,y) 

or, in matrix notation, 

'1 O' 

_1 1 _ 

Step 3. Reassembly into S (Figure 10. 13. It/): 

(x, y) — »■ (x, y) mod 1 

The geometric effect of the mod 1 arithmetic is to break up the parallelogram in Fig- 
ure 10.13.1c and reassemble the pieces of S as shown in Figure 10.13.W. 

For computer implementation, it is more convenient to perform the mod 1 arithmetic 
at each step, rather than at the end. With this approach there is a reassembly at each 
step, but the net effect is the same. The steps are as follows: 

Step 1. Shear in the x-direction with factor 1, followed by a reassembly into S (Figure 
10.13.26): 

(x, y) (x + y, y) mod 1 

Step 2. Shear in the y-direction with factor 1 , followed by a reassembly into S (Figure 
10.13.2c): 


l 


o 

▲ Figure 10.13.2 

Repeated Mappings Chaotic mappings such as Arnold’s cat map usually arise in physical models in which an 
operation is performed repeatedly. For example, cards are mixed by repeated shuffles, 
paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal changes, 
and so forth. Thus, we are interested in examining the effect on S of repeated applications 
(or iterations ) of Arnold’s cat map. Figure 10.13.3, which was generated on a computer, 
shows the effect of 25 iterations of Arnold’s cat map on the cat in the unit square S. Two 
interesting phenomena occur: 

The cat returns to its original form at the 25th iteration. 

At some of the intermediate iterations, the cat is decomposed into streaks that seem 
to have a specific direction. 

Much of the remainder of this section is devoted to explaining these phenomena. 
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Periodic Points 


-* — 101 pixels — *- 



Iteration 1 Iteration 2 Iteration 3 Iteration 4 



Iteration 15 Iteration 16 Iteration 17 Iteration 18 Iteration 19 



Iteration 20 Iteration 21 Iteration 22 Iteration 23 Iteration 24 



Iteration 25 


A Figure 10.13.3 


Our first goal is to explain why the cat in Figure 10.13.3 returns to its original configu- 
ration at the 25th iteration. For this purpose it will be helpful to think of a picture in the 
xy-plane as an assignment of colors to the points in the plane. For pictures generated on 
a computer screen or other digital device, hardware limitations require that a picture be 
broken up into discrete squares, called pixels. For example, in the computer-generated 
pictures in Figure 10.13.3 the unit square S is divided into a grid with 101 pixels on a 
side for a total of 10,201 pixels, each of which is black or white (Figure 10.13.4). An 
assignment of colors to pixels to create a picture is called a pixel map. 



Enlarged view of cat's face 
showing individual pixels 



► Figure 10.13.4 
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As shown in Figure 10.13.5, each pixel in S can be assigned a unique pair of coor- 
dinates of the form (m/101, n/101) that identifies its lower left-hand corner, where m 
and n are integers in the range 0, 1 , 2, . . . , 100. We call these points pixel points because 
each such point identifies a unique pixel. Instead of restricting the discussion to the case 
where S is subdivided into an array with 101 pixels on a side, let us consider the more 
general case where there are p pixels per side. Thus, each pixel map in S consists of p 2 
pixels uniformly spaced 1/p units apart in both the x- and the y -directions. The pixel 
points in S have coordinates of the form ( m / p,n/ p), where m and n are integers ranging 
from 0 to p — 1 . 



1U1 _0 1 2 3 _ m . 100 

► Figure 10.13.5 101 101 101 101 101 101 


Under Arnold’s cat map each pixel point of S is transformed into another pixel point 
of S. To see why this is so, observe that the image of the pixel point (m /p,n/p) under 
T is given in matrix form by 
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The ordered pair (( m + n)/ p, (m + 2 n)/p) mod 1 is of the form (m'/p, n'/p), where 
m' and n ' lie in the range 0, 1, 2, . . . , p — 1. Specifically, m 1 and n' are the remainders 
when m + n and m + 2 n are divided by p, respectively. Consequently, each point in S 
of the form (m/p, n/ p) is mapped onto another point of the same form. 

Because Arnold’s cat map transforms every pixel point of S into another pixel point of 
S, and because there are only p 2 different pixel points in S, it follows that any given pixel 
point must return to its original position after at most p 2 iterations of Arnold’s cat map. 


► EXAMPLE 1 Using Formula (2) 

If p = 76, then (2) becomes 
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4 8 


▲ Figure 10.13.6 

Period Versus Pixel Width 


► Figure 10.13.7 


(verify). Because the point returns to its initial position on the ninth application of 
Arnold’s cat map (but no sooner), the point is said to have period 9, and the set of nine 
distinct iterates of the point is called a 9-cycle. Figure 10.13.6 shows this 9-cycle with 
the initial point labeled 0 and its successive iterates labeled accordingly. 


In general, a point that returns to its initial position after n applications of Arnold’s 
cat map, but does not return with fewer than n applications, is said to have period n, and 
its set of n distinct iterates is called an n-cycle. Arnold’s cat map maps (0, 0) into (0, 0), 
so this point has period 1 . Points with period 1 are also called fixed points. We leave it as 
an exercise (Exercise 1 1) to show that (0, 0) is the only fixed point of Arnold’s cat map. 


If P\ and P2 are points with periods q\ and q2, respectively, then Pi returns to its 
initial position in <71 iterations (but no sooner), and P 2 returns to its initial position in 
q 2 iterations (but no sooner); thus, both points return to their initial positions in any 
number of iterations that is a multiple of both q\ and <72- In general, for a pixel map with 
p 2 pixel points of the form (m/ p, n/p), we let V\(p) denote the least common multiple 
of the periods of all the pixel points in the map [i.e., II (p) is the smallest integer that 
is divisible by all of the periods]. It follows that the pixel map will return to its initial 
configuration in II ( p ) iterations of Arnold’s cat map (but no sooner). For this reason, we 
call I! ( p) the period of the pixel map. In Exercise 4 we ask you to show that if p = 101, 
then all pixel points have period 1, 5, or 25, so 11(101) = 25. This explains why the cat 
in Figure 10.13.3 returned to its initial configuration in 25 iterations. 

Figure 10.13.7 shows how the period of a pixel map varies with p. Although the 
general tendency is for the period to increase as p increases, there is a surprising amount 
of irregularity in the graph. Indeed, there is no simple function that specifies this rela- 
tionship (see Exercise 1). 



100 150 200 250 300 350 

p (Side length of unit square in pixels) 


500 


Although a pixel map with p pixels on a side does not return to its initial config- 
uration until Il(p) iterations have occurred, various unexpected things can occur at 
intermediate iterations. For example, Figure 10.13.8 shows a pixel map with p = 250 of 
the famous Hungarian-American mathematician John von Neumann. It can be shown 
that 11(250) = 750; hence, the pixel map will return to its initial configuration after 


250 pixels 


10.13 Chaos 645 


750 iterations of Arnold’s cat map (but no sooner). However, after 375 iterations the 
pixel map is turned upside down, and after another 375 iterations (for a total of 750) 
the pixel map is returned to its initial configuration. Moreover, there are so many pixel 
points with periods that divide 750 that multiple ghostlike images of the original likeness 
occur at intermediate iterations; at 195 iterations numerous miniatures of the original 
likeness occur in diagonal rows. 


250 pixels 


I 


A Figure 10.13.8 [Image: Photographer unknown. Courtesy of The Shelby White and Leon Levy Archives Center, 
Institute for Advanced Study, Princeton, NJ, USA ] 
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The Tiled Plane Our next objective is to explain the cause of the linear streaks that occur in Figure 10. 13.3. 

For this purpose it will be helpful to view Arnold’s cat map another way. As defined, 
Arnold’s cat map is not a linear transformation because of the mod 1 arithmetic. How- 
ever, there is an alternative way of defining Arnold’s cat map that avoids the mod 1 
arithmetic and results in a linear transformation. For this purpose, imagine that the 
unit square S with its picture of the cat is a “tile,” and suppose that the entire plane is 
covered with such tiles, as in Figure 10.13.9. We say that the xy-plane has been tiled 
with the unit square. If we apply the matrix transformation in (1) to the entire tiled 
plane without performing the mod 1 arithmetic, then it can be shown that the portion 
of the image within S will be identical to the image that we obtained using the mod 1 
arithmetic (Figure 10.13.9). In short, the tiling results in the same pixel map in S as the 
mod 1 arithmetic, but in the tiled case Arnold’s cat map is a linear transformation. 

It is important to understand, however, that tiling and mod 1 arithmetic produce 
periodicity in different ways. If a pixel map in S has period n, then in the case of mod 1 
arithmetic, each point returns to its original position at the end of n iterations. In the 
case of tiling, points need not return to their original positions; rather, each point is 
replaced by a point of the same color at the end of n iterations. 
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0 1 2 
▲ Figure 10.13.9 



Properties of Arnold's Cat To understand the cause of the streaks in Figure 10.13.3, think of Arnold’s cat map as a 
Map linear transformation on the tiled plane. Observe that the matrix 


that defines Arnold’s cat map is symmetric and has a determinant of 1 . The fact that the 
determinant is 1 means that multiplication by this matrix preserves areas; that is, the area 
of any figure in the plane and the area of its image are the same. This is also true for figures 
in S in the case of mod 1 arithmetic, since the effect of the mod 1 arithmetic is to cut up 
the figure and reassemble the pieces without any overlap, as shown in Figure 10.13.U/. 
Thus, in Figure 10.13.3 the area of the cat (whatever it is) is the same as the total area of 
the blotches in each iteration. 

The fact that the matrix is symmetric means that its eigenvalues are real and the 
corresponding eigenvectors are perpendicular. We leave it for you to show that the 
eigenvalues and corresponding eigenvectors of C are 
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For each application of Arnold’s cat map, the eigenvalue ki causes a stretching in the 
direction of the eigenvector vi by a factor of 2.6180 . . . , and the eigenvalue X 2 causes 
a compression in the direction of the eigenvector v 2 by a factor of 0.3819 .... Fig- 
ure 10.13.10 shows a square centered at the origin whose sides are parallel to the two 
eigenvector directions. Under the above mapping, this square is deformed into the rect- 
angle whose sides are also parallel to the two eigenvector directions. The area of the 
square and rectangle are the same. 

To explain the cause of the streaks in Figure 10.13.3, consider S to be part of the 
tiled plane, and let p be a point of S with period n. Because we are considering tiling, 
there is a point q in the plane with the same color as p that on successive iterations moves 
toward the position initially occupied by p, reaching that position on the nth iteration. 
This point is q = (A _1 )"p = A~"p, since 

A ,! q = A"(A~"p) = p 

Thus, with successive iterations, points of S flow away from their initial positions, while 
at the same time other points in the plane (with corresponding colors) flow toward those 
initial positions, completing their trip on the final iteration of the cycle. Figure 10.13.11 
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Nonperiodic Points 


illustrates this in the case where n = 4, q = (— |), and p = A 4 q = ( ' , |). Note that 

p mod 1 = q mod 1 = (|, |), so both points occupy the same positions on their respec- 
tive tiles. The outgoing point moves in the general direction of the eigenvector vi, as 
indicated by the arrows in Figure 10.13.11, and the incoming point moves in the gen- 
eral direction of eigenvector v 2 . It is the “flow lines” in the general directions of the 
eigenvectors that form the streaks in Figure 10.13.3. 


► Figure 10.13.10 


► Figure 10.13.11 



Thus far we have considered the effect of Arnold’s cat map on pixel points of the form 
(m/p, n/p) for an arbitrary positive integer p. We know that all such points are periodic. 
We now consider the effect of Arnold’s cat map on an arbitrary point (a, b) in S. We 
classify such points as rational if the coordinates a and b are both rational numbers, 
and irrational if at least one of the coordinates is irrational. Every rational point is 
periodic, since it is a pixel point for a suitable choice of p. For example, the rational 
point (ri/si, rij si) can be written as (r\Si/s\Si, r 2 s\ /s\Si), so it is a pixel point with 
p = .y , ,y 2 . It can be shown (Exercise 13) that the converse is also true: Every periodic 
point must be a rational point. 

It follows from the preceding discussion that the irrational points in S are nonperi- 
odic, so that successive iterates of an irrational point (xo, yo) in S must all be distinct 
points in S. Figure 10.13.12, which was computer generated, shows an irrational point 
and selected iterates up to 100,000. For the particular irrational point that we selected, 
the iterates do not seem to cluster in any particular region of S; rather, they appear to 
be spread throughout S, becoming denser with successive iterations. 

The behavior of the iterates in Figure 10.13.12 is sufficiently important that there is 
some terminology associated with it. We say that a set D of points in S is dense in S 
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if every circle centered at any point of S encloses points of D, no matter how small the 
radius of the circle is taken (Figure 10.13.13). It can be shown that the rational points 
are dense in S and the iterates of most (but not all) of the irrational points are dense in S. 




10,000 iterations 25,000 iterations 50,000 iterations 100.000 iterations 

▲ Figure 10.13.12 



Definition of Chaos We know that under Arnold’s cat map, the rational points of S are periodic and dense in 
S and that some but not all of the irrational points have iterates that are dense in S. These 
are the basic ingredients of chaos. There are several definitions of chaos in current use, but 
the following one, which is an outgrowth of a definition introduced by Robert L. Devaney 
in 1986 in his book An Introduction to Chaotic Dynamical Systems (Benjamin/Cummings 
Publishing Company), is most closely related to our work. 

DEFINITION 1 A mapping T of S onto itself is said to be chaotic if: 

(i) S contains a dense set of periodic points of the mapping T . 

(ii) There is a point in S whose iterates under T are dense in S. 
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Thus Arnold’s cat map satisfies the definition of a chaotic mapping. What is noteworthy 
about this definition is that a chaotic mapping exhibits an element of order and an 
element of disorder — the periodic points move regularly in cycles, but the points with 
dense iterates move irregularly, often obscuring the regularity of the periodic points. 
This fusion of order and disorder characterizes chaotic mappings. 


Dynamical Systems Chaotic mappings arise in the study of dynamical systems. Informally stated , a dynamical 
system can be viewed as a system that has a specific state or configuration at each point of 
time but that changes its state with time. Chemical systems, ecological systems, electrical 
systems, biological systems, economic systems, and so forth can be looked at in this way. 
In a discrete-time dynamical system , the state changes at discrete points of time rather 
than at each instant. In a discrete-time chaotic dynamical system , each state results from 
a chaotic mapping of the preceding state. For example, if one imagines that Arnold’s cat 
map is applied at discrete points of time, then the pixel maps in Figure 10.13.3 can be 
viewed as the evolution of a discrete-time chaotic dynamical system from some initial 
set of states (each point of the cat is a single initial state) to successive sets of states. 

One of the fundamental problems in the study of dynamical systems is to predict 
future states of the system from a known initial state. In practice, however, the exact 
initial state is rarely known because of errors in the devices used to measure the initial 
state. It was believed at one time that if the measuring devices were sufficiently accurate 
and the computers used to perform the iteration were sufficiently powerful, then one 
could predict the future states of the system to any degree of accuracy. But the discovery 
of chaotic systems shattered this belief because it was found that for such systems the 
slightest error in measuring the initial state or in the computation of the iterates becomes 
magnified exponentially, thereby preventing an accurate prediction of future states. Let 
us demonstrate this sensitivity to initial conditions with Arnold’s cat map. 

Suppose that P 0 is a point in the xy-plane whose exact coordinates are (0.77837, 
0.70904). A measurement error of 0.00001 is made in the y-coordinate, such that the 
point is thought to be located at (0.77837, 0.70905), which we denote by Q 0 . Both P 0 
and Q 0 are pixel points with p = 100,000 (why?), and thus, since 11(100,000) = 75,000, 
both return to their initial positions after 75,000 iterations. In Figure 10.13.14 we show 
the first 50 iterates of Pq under Arnold’s cat map as crosses and the first 50 iterates of 
Qo as circles. Although P 0 and Qo are close enough that their symbols overlap initially, 
only their first eight iterates have overlapping symbols; from the ninth iteration on their 
iterates follow divergent paths. 

It is possible to quantify the growth of the error from the eigenvalues and eigenvectors 
of Arnold’s cat map. For this purpose we will think of Arnold’s cat map as a linear 
transformation on the tiled plane. Recall from Figure 10. 13. 10 and the related discussion 
that the projected distance between two points in S in the direction of the eigenvector vi 
increases byafactorof2. 6180 . . . (= A^) with each iteration (Figure 10.13.15). After nine 
iterations this projected distance increases by a factor of (2.6180 . . .) 9 = 5777.99 
and with an initial error of roughly 1/100,000 in the direction of vi, this distance is 
0.05777 . . . , or about p= the width of the unit square S. After 12 iterations this small 
initial error grows to (2.6180 . . .) 12 / 100,000 = 1.0368 . . . , which is greater than the 
width of S. Thus, we completely lose track of the true iterates within S after 12 iterations 
because of the exponential growth of the initial error. 

Although sensitivity to initial conditions limits the ability to predict the future evo- 
lution of dynamical systems, new techniques are presently being investigated to describe 
this future evolution in alternative ways. 
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▲ Figure 10.13.15 


Exercise Set 10.13 


1. In a journal article [F. J. Dyson and H. Falk, “Period of a Dis- 
crete Cat Mapping,” The American Mathematical Monthly , 99 
(August-September 1992), pp. 603-614] the following results 
concerning the nature of the function FI(p) were established: 

(i) n(p) = 3/7 if and only if p = 2 • 5* for k = 1,2 

(ii) n(p) = 2 p if and only if p = 5 k for k = 1, 2, . . . or 
p = 6 - 5* for k = 0, 1, 2, ... . 

(iii) n(p) < 12p/7 for all other choices of p. 

Find n(250), 11(25), 11(125), 11(30), 11(10), 11(50), 
ri(3750), 11(6), and 11(5). 

2. Find all the n-cycles that are subsets of the 36 points in 5 of 
the form (m/6, n/6) with m and n in the range 0, 1, 2, 3, 4, 5. 
Then find 11(6). 


If we take p = 1 and pick x 0 and x t from the inter- 
val [0, 1), then the above random-number generator produces 
pseudorandom numbers in the interval [0. 1). The resulting 
scheme is precisely Arnold’s ct map. Furthermore, if we elim- 
inate the modular arithmetic in the algorithm and take x 0 = 
X\ = 1, then the resulting sequence of integers is the famous 
Fibonacci sequence, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ... , in 
which each number after the first two is the sum of the pre- 
ceding two numbers. 


4. For C = 


1 

1 


r 

2 


it can be verified that 


" 7,778,742.049 12,586,269,025" 

12,586,269,025 20, 365,01 1 ,074_ 


3. ( Fibonacci Shift-Register Random-Number Generator) A well- 
known method of generating a sequence of “pseudorandom” 
integers x 0 , X\ , x 2 , x 2 , ... in the interval from 0 to p — 1 is 
based on the following algorithm: 

(i) Pick any two integers Xo and x\ from the range 

0,1,2 p-1. 

(ii) Set jc„ + i = (x„ + x n -\ ) mod p for n = 1, 2, . . . . 

Here x mod p denotes the number in the interval from 0 to 
p — 1 that differs from x by a multiple of p. For example, 35 
mod 9 = 8 (because 8 = 35 — 3 • 9); 36 mod 9 = 0 (because 
0 = 36 — 4 • 9); and —3 mod 9 = 6 (because 
6 = —3 +1-9). 


It can also be verified that 12,586,269,025 is divisible by 101 
and that when 7,778,742,049 and 20,365,01 1,074 are divided 
by 101, the remainder is 1. 

(a) Show that every point in S of the form (m/101, n/101) 
returns to its starting position after 25 iterations under 
Arnold’s cat map. 

(b) Show that every point in S of the form (m/101, n/101) 
has period 1, 5, or 25. 

(c) Show that the point (1^7, 0) has period greater than 5 by 
iterating it five times. 

(d) Show that n(101) = 25. 


(a) Generate the sequence of pseudorandom numbers that re- 
sults from the choices p = 15, Xo = 3, and.ri = 7 until the 
sequence starts repeating. 

(b) Show that the following formula is equivalent to step (ii) 
of the algorithm: 


Xfl+ 1 



"i r 

~X„-1 

mod p for n = 1, 2, 3, . . . 

X 

_-^n+2_ 


.1 2 . 

x n 




5. Show that for the mapping T : S —> S defined by 

T (x, y) = (x + y) mod 1, every point in S is a periodic 

point. Why does this show that the mapping is not chaotic? 

6. An Anosov automorphism on R 2 is a mapping from the unit 
square S onto S of the form 


(c) Use the formula in part (b) to generate the sequence of 
vectors for the choices p = 21, x 0 = 5, and jci = 5 until 
the sequence starts repeating. 


mod 1 


in which (i) a, b, c, and d are integers, (ii) the determinant 
of the matrix is ±1, and (iii) the eigenvalues of the matrix 
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do not have magnitude 1. It can be shown that all Anosov 
automorphisms are chaotic mappings. 

(a) Show that Arnold’s cat map is an Anosov automorphism. 

(b) Which of the following are the matrices of an Anosov 
automorphism? 


'o r 


'3 2" 

1 0. 

’ 

1 1_ 

'5 r 


'6 2" 

.2 3_ 

’ 

_5 2_ 


X 


' 0 

r 

" x " 

_y_ 


-1 

o_ 

.y. 


X 


'i r 

" x " 

+ 

a 

_y_ 


.1 2 . 

.y m 


b 


Xji 


'l 

f 

n 

V 

Jn. 


l 

2 


Jo_ 


12. Find all 2-cycles of Arnold’s cat map by finding all solutions 
of the equation 


mod 1 


with 0 < xo < 1 and 0 < yo < 1 ■ [Hint: For appropriate non- 
negative integers, r and s, we can write 


V 


'1 

f 

2 

'x 0 " 

.yo. 


1 

2 _ 


.yo. 


V 


'2 3" 

~Xo~ 


r 



_3 5_ 

Jo_ 


s 


(c) Show that the following mapping of S onto S is not an 
Anosov automorphism. 


mod 1 


What is the geometric effect of this transformation on 5? 
Use your observation to show that the mapping is not a 
chaotic mapping by showing that all points in 5 are peri- 
odic points. 

7. Show that Arnold’s cat map is one-to-one over the unit square 
5 and that its range is S. 

8. Show that the inverse of Arnold’s cat map is given by 

r -1 (jc, y) = (2x — y, — x + y) mod 1 

9. Show that the unit square S can be partitioned into four tri- 
angular regions on each of which Arnold’s cat map is a trans- 
formation of the form 


for the preceding equation.] 

13. Show that every periodic point of Arnold’s cat map must be a 
rational point by showing that for all solutions of the equation 


"x 0 ’ 


‘i 

r 

n 

Xo' 

.yo. 


_i 

2 


.yo. 


mod 1 

the numbers xo and yo are quotients of integers. 

14. Let T be the Arnold’s cat map applied five times in a row; 
that is, T = r 5 . Figure Ex-14 represents four successive map- 
pings of T on the first image, each image having a resolution 
of 101 x 101 pixels. The fifth mapping returns to the first im- 
age because this cat map has a period of 25. Explain how you 
might generate this particular sequence of images. 



where a and b need not be the same for each region. [Hint: 
Find the regions in S that map onto the four shaded regions 
of the parallelogram in Figure 10.13.1 rf.] 

10. If (x 0 , yo) is a point in S and (x„, y„) is its nth iterate under 
Arnold’s cat map, show that 


mod 1 


This result implies that the modular arithmetic need only be 
performed once rather than after each iteration. 

11. Show that (0, 0) is the only fixed point of Arnold’s cat map by 
showing that the only solution of the equation 


A Figure Ex-14 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab. Mathematical Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. The methods of Exercise 4 show that for the cat map, Fl(p) 
is the smallest integer satisfying the equation 


Xo' 


T 

r 

'x 0 ' 

mod 1 

'1 r 

n i P ) 

T 0" 

.yo. 


_1 

2_ 

.yo. 


1 2 

mod p = 

.0 1 


with 0 < xo < 1 and 0 < yo < 1 is Xo = yo = 0. [Hint: For 
appropriate nonnegative integers, r and s, we can write 


V 


t r 

V 


r 

.yo. 


.1 2 _ 

.y 0 . 


_s _ 


This suggests that one way to determine n (p) is to compute 

'1 1 " 

1 2 


mod p 


for the preceding equation.] 


starting with n — 1 and stopping when this produces the iden- 
tity matrix. Use this idea to compute Fl(p) for p = 2, 3, . . . , 10. 


652 r 10 Applications of Linear Algebra 


Compare your results to the formulas given in Exercise 1, if they and write C = PDP hence, C" = PD"P Use a computer to 
apply. What can you conjecture about show that 


"1 r 
1 2_ 

tri(p) 

mod p 

C n = 

" ( «) 
C 11 

J n ) 

C 12 


>) 




C 2l 

c 22 


when n(p) is even? 

T2. The eigenvalues and eigenvectors for the cat map matrix 

'1 H 


C = 


1 2 


, 3 + V5 3 — -s/5 

^1 = “ , 3-2 = r , 



1 


1 

Vi = 

1 + Vs 

, V 2 = 

1 - Vs 


2 


2 


Using these eigenvalues and eigenvectors, we can define 


D = 

3 + V5 

2 

GJ 

1 0 

1 

and P — 

1 

1 + s/ 5 

1 

1 - Vs 


0 

2 


L 2 

2 J 


where 


„(») 


r ("> _ 
c 22 ~ 


1 + n/5\ /3 — -n/5\ / 1 — -n/5 \ ( 3 + Vs 


2s/S / \ 2 

l + V5\ / 3 + V5 
Isfl 


2VS / \ 2 

l-v / 5\ /3-V5 
2^5 


and 


J ") _ r M _ 
L n — L n — 


1 f/3 + x/lY' (i-Vs 


V~s 


How can you use these results and your conclusions in Exercise 
T1 to simplify the method for computing E \(p)2 


10.14 Cryptography 

In this section we present a method of encoding and decoding messages. We also examine 
modular arithmetic and show how Gaussian elimination can sometimes be used to break an 
opponent’s code. 

Matrices 

Gaussian Elimination 

Matrix Operations 

Linear Independence 

Linear Transformations (Section 4.9) 

Ciphers The study of encoding and decoding secret messages is called cryptography. Although 
secret codes date to the earliest days of written communication, there has been a recent 
surge of interest in the subject because of the need to maintain the privacy of information 
transmitted over public lines of communication. In the language of cryptography, codes 
are called ciphers, uncoded messages are called plaintext , and coded messages are called 
ciphertext. The process of converting from plaintext to ciphertext is called enciphering, 
and the reverse process of converting from ciphertext to plaintext is called deciphering. 

The simplest ciphers, called substitution ciphers, are those that replace each letter of 
the alphabet by a different letter. For example, in the substitution cipher 

Plain ABCDEFGHIJ KLMNOPQRSTUVWXYZ 
Cipher D E F G H I J K L M N O P QRSTUVWXYZ ABC 

the plaintext letter A is replaced by D, the plaintext letter B by E, and so forth. With 
this cipher the plaintext message 


ROME WAS NOT BUILT IN A DAY 
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Hill Ciphers 


becomes 

URPH ZDV QRWEXLOWLQ D GDB 

A disadvantage of substitution ciphers is that they preserve the frequencies of individual 
letters, making it relatively easy to break the code by statistical methods. One way to 
overcome this problem is to divide the plaintext into groups of letters and encipher the 
plaintext group by group, rather than one letter at a time. A system of cryptography 
in which the plaintext is divided into sets of n letters, each of which is replaced by a 
set of n cipher letters, is called a polygraphic system. In this section we will study a 
class of polygraphic systems based on matrix transformations. [The ciphers that we will 
discuss are called Hill ciphers after Lester S. Hill, who introduced them in two papers: 
“Cryptography in an Algebraic Alphabet,” American Mathematical Monthly, 36 (June- 
July 1929), pp. 306-312; and “Concerning Certain Linear Transformation Apparatus of 
Cryptography,” American Mathematical Monthly, 38 (March 1931), pp. 135-154.] 

In the discussion to follow, we assume that each plaintext and ciphertext letter except 
Z is assigned the numerical value that specifies its position in the standard alphabet 
(Table 1 ). For reasons that will become clear later, Z is assigned a value of zero. 


Table 1 


A B 

C 

D 

E 

F 

G 

H 

I 

J 

K 

L 

M 

N 

O 

P 

Q 

R 

5 

T 

U 

V W 

X 

Y 

Z 

1 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 23 

24 

25 

0 


In the simplest Hill ciphers, successive pairs of plaintext are transformed into cipher- 
text by the following procedure: 

Step 1. Choose a 2 x 2 matrix with integer entries 


A = 


an 

fl21 


«12 
« 22 , 


to perform the encoding. Certain additional conditions on A will be imposed 
later. 


Step 2. Group successive plaintext letters into pairs, adding an arbitrary “dummy” letter 
to fill out the last pair if the plaintext has an odd number of letters, and replace 
each plaintext letter by its numerical value. 

Step 3. Successively convert each plaintext pair p\ p 2 into a column vector 


P i 

.Pi. 


and form the product Ap. We will call p a plaintext vector and Ap the corre- 
sponding ciphertext vector. 

Step 4. Convert each ciphertext vector into its alphabetic equivalent. 


► EXAMPLE 1 Hill Cipher of a Message 

Use the matrix 

'1 2 ' 

.0 3_ 

to obtain the Hill cipher for the plaintext message 

I AM HIDING 
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Solution If we group the plaintext into pairs and add the dummy letter G to fill out the 
last pair, we obtain 

IA MH ID IN GG 
or, equivalently, from Table 1, 

91 13 8 94 9 14 77 


To encipher the pair IA, we form the matrix product 


'1 

2' 

"9" 


'll" 

_0 

3_ 

_ 1 _ 


_ 3_ 


which, from Table 1, yields the ciphertext KC. 

To encipher the pair MH, we form the product 


"1 

2' 

"13' 


"29" 

_0 

3_ 

_ 8_ 


_24_ 


( 1 ) 


However, there is a problem here, because the number 29 has no alphabet equivalent 
(Table 1 ). To resolve this problem, we make the following agreement: 

Whenever an integer greater than 25 occurs, it will be 
replaced by the remainder that results when this 
integer is divided by 26 . 

Because the remainder after division by 26 is one of the integers 0, 1, 2, . . . , 25, this 
procedure will always yield an integer with an alphabet equivalent. 

Thus, in (1) we replace 29 by 3, which is the remainder after dividing 29 by 26. It 
now follows from Table 1 that the ciphertext for the pair MH is CX. 

The computations for the remaining ciphertext vectors are 


"1 2" 

' 9" 


"37" 

or 

"ir 

.0 3_ 

14 


.42. 


_16_ 


These correspond to the ciphertext pairs QL, KP , and UU, respectively. In summary, 
the entire ciphertext message is 

KC CX QL KP UU 

which would usually be transmitted as a single string without spaces: 

KCCXQLKPUU ◄ 


Because the plaintext was grouped in pairs and enciphered by a 2 x 2 matrix, the 
Hill cipher in Example 1 is referred to as a Hill 2-cipher. It is obviously also possible to 
group the plaintext in triples and encipher by a 3 x 3 matrix with integer entries; this is 
called a Hill 3-cipher. In general, for a Hill n-cipher, plaintext is grouped into sets of n 
letters and enciphered by an n x n matrix with integer entries. 

In Example 1, integers greater than 25 were replaced by their remainders after division 
by 26. This technique of working with remainders is at the core of a body of mathematics 
called modular arithmetic. Because of its importance in cryptography, we will digress for 
a moment to touch on some of the main ideas in this area. 
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In modular arithmetic we are given a positive integer m , called the modulus , and 
any two integers whose difference is an integer multiple of the modulus are regarded 
as “equal” or “equivalent” with respect to the modulus. More precisely, we make the 
following definition. 


DEFINITION 1 If m is a positive integer and a and b are any integers, then we say that 
a is equivalent to b modulo in . written 

a=b (mod m ) 

if a — b is an integer multiple of m . 


EXAMPLE 2 Various Equivalences 


7 = 2 

(mod 5) 

19 = 3 

(mod 2) 

-1 = 25 

(mod 26) 

12 = 0 

(mod 4) 


For any modulus m it can be proved that every integer a is equivalent, modulo m, to 
exactly one of the integers 

0, 1 , 2, . . . , m — 1 

We call this integer the residue of a modulo in. and we write 

Z m = {0, 1,2, ...,m-l} 
to denote the set of residues modulo m. 

If a is a nonnegative integer, then its residue modulo m is simply the remainder that 
results when a is divided by m . For an arbitrary integer a, the residue can be found using 
the following theorem. 


For any integer a and modulus m, let 


R = remainder of 

m 


Then the residue r of a modulo m is given by 


R 


r = 


m — R 


0 


if a > 0 

if a < 0 and R 0 
if a < 0 and R — 0 


[ EXAMPLE 3 Residues mod 26 

Find the residue modulo 26 of (a) 87, (b) —38, and (c) —26. 

Solution (a) Dividing |87| = 87 by 26 yields a remainder of R = 9. so r = 9. Thus, 

87 = 9 (mod 26) 

Solution [b) Dividing |— 38 1 = 38 by 26 yields a remainder of R = 12, so r — 26 — 12 = 
14. Thus, 

-38 = 14 (mod 26) 
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Solution (c) Dividing |— 26| — 26 by 26 yields a remainder of R = 0. Thus, 

-26 = 0 (mod 26) ◄ 

In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative 
inverse , denoted by a -1 , such that 

at? -1 = a~ x a = 1 

In modular arithmetic we have the following corresponding concept: 


DEFINITION 2 If a is a number in Z m , then a number a 1 in Z m is called a reciprocal 
or multiplicative inverse of a modulo m if aa~ l = a~ l a = 1 (mod m). 


It can be proved that if a and m have no common prime factors, then a has a unique 
reciprocal modulo m; conversely, if a and m have a common prime factor, then a has no 
reciprocal modulo m . 


I EXAMPLE 4 Reciprocal of 3 mod 26 

The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime 
factors. This reciprocal can be obtained by finding the number x in Z 2 6 that satisfies the 
modular equation 

3x = 1 (mod 26) 

Although there are general methods for solving such modular equations, it would take 
us too far afield to study them. However, because 26 is relatively small, this equation 
can be solved by trying the possible solutions, 0 to 25, one at a time. With this approach 
we find that x = 9 is the solution, because 

3 • 9 = 27 = 1 (mod 26) 


Thus, 


3 _1 = 9 (mod 26) 


► EXAMPLE 5 A Number with No Reciprocal mod 26 

The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime 
factor (see Exercise 9). M 


For future reference, in Table 2 we provide the following reciprocals modulo 26: 
Table 2 Reciprocals Modulo 26 


a 

1 

3 

5 

7 

9 

11 

15 

17 

19 

21 

23 

25 

a- 1 

1 

9 

21 

15 

3 

19 

7 

23 

11 

5 

17 

25 


Deciphering Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, 
decipherment uses the inverse (mod 26) of the enciphering matrix. To be precise, if m 
is a positive integer, then a square matrix A with entries in Z m is said to be invertible 
modulo m if there is a matrix B with entries in Z m such that 


AB — BA = I (mod m) 
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Suppose now that 

. All ^12 

A = 

_fl21 fl22. 

is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If 

Pi 
P2j 

is a plaintext vector, then 

c = Ap (mod 26) 

is the corresponding ciphertext vector and 

p = A~'c (mod 26) 


Thus, each plaintext vector can be recovered from the corresponding ciphertext vector 
by multiplying it on the left by A' 1 (mod 26). 

In cryptography it is important to know which matrices are invertible modulo 26 and 
how to obtain their inverses. We now investigate these questions. 

In ordinary arithmetic, a square matrix A is invertible if and only if det(A) ^ 0, or, 
equivalently, if and only if det(A) has a reciprocal. The following theorem is the analog 
of this result in modular arithmetic. 


A square matrix A with entries in Z m is invertible modulo m if and 
only if the residue of det(A) modulo m has a reciprocal modulo m. 


Because the residue of det(A) modulo m will have a reciprocal modulo m if and only 
if this residue and m have no common prime factors, we have the following corollary. 


COROLLARY 10.14.3 A square matrix A with entries in Z m is invertible modulo m if 
and only if m and the residue of det(A) modulo m have no common prime factors. 


Because the only prime factors of m = 26 are 2 and 13, we have the following corol- 
lary, which is useful in cryptography. 


A square matrix A with entries in Z 2 6 is invertible modulo 26 if 
and only if the residue of det(A) modulo 26 is not divisible by 2 or 13. 


We leave it for you to verify that if 

A = 


a b 
c d 


has entries in Z 2 6 and the residue of det(A) — ad — be modulo 26 is not divisible by 2 
or 13, then the inverse of A (mod 26) is given by 


A 1 = {ad — be) 


d 

-c 


(mod 26) 


( 2 ) 


where {ad — be) 1 is the reciprocal of the residue of ad — be (mod 26). 
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► EXAMPLE 6 Inverse of a Matrix mod 26 

Find the inverse of 

A = 


5 6 
2 3 


modulo 26. 

Solution 

so from Table 2, 
Thus, from (2), 

A -1 = 9 

As a check, 

AA' 1 = 
Similarly, A -1 A = I. 


det(A) = ad — be = 5-3 — 6-2 = 3 
(ad - be r 1 = 3 _1 = 9 (mod 26) 


3 -6' 
-2 5 


27 -54' 
-18 45 


1 24' 

8 19 


'5 

6' 

'1 

24' 


'53 

234' 


'1 

O' 

2 

3_ 

_8 

19 


26 

105 


.0 

1 


(mod 26) 


(mod 26) 


I EXAMPLE 7 Decoding a Hill 2-Cipher 

Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6: 

GTNKGKDUSK 

Solution From Table 1 the numerical equivalent of this ciphertext is 
7 20 14 11 7 11 4 21 19 11 


To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A 
(obtained in Example 6): 


'1 

24' 

' T 


'487' 


'19' 

.8 

19_ 

.20. 


436. 


.20. 

'1 

24' 

'14' 


'278' 


'18' 

_8 

19. 

11 


_32i_ 


9. 

'1 

24' 

' 7' 


'271' 


'll" 

_8 

19. 

11 


.265. 


. 5. 

'1 

24' 

' 4' 


'508' 


'14' 

_8 

i9_ 

.21 


.431. 


.15. 

'1 

24' 

'19' 


'283' 


'23" 

_8 

!9_ 

11 


361. 


.23. 


(mod 26) 
(mod 26) 


(mod 26) 


(mod 26) 


(mod 26) 


From Table 1 , the alphabet equivalents of these vectors are 


ST RI KE NO WW 


which yields the message 


STRIKE NOW ◄ 
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Breaking a Hill Cipher 


Because the purpose of enciphering messages and information is to prevent “opponents” 
from learning their contents, cryptographers are concerned with the security of their 
ciphers — that is, how readily they can be broken (deciphered by their opponents). We 
will conclude this section by discussing one technique for breaking Hill ciphers. 

Suppose that you are able to obtain some corresponding plaintext and ciphertext 
from an opponent’s message. For example, on examining some intercepted ciphertext, 
you may be able to deduce that the message is a letter that begins DEAR SIR. We will show 
that with a small amount of such data, it may be possible to determine the deciphering 
matrix of a Hill code and consequently obtain access to the rest of the message. 

It is a basic result in linear algebra that a linear transformation is completely deter- 
mined by its values at a basis. This principle suggests that if we have a Hill n -cipher, and 


are linearly independent plaintext vectors whose corresponding ciphertext vectors 

Api, Ap 2 Ap„ 

are known, then there is enough information available to determine the matrix A and 
hence A -1 (modwi). 

The following theorem, whose proof is discussed in the exercises, provides a way to 
do this. 


Determining the Deciphering Matrix 

Let Pr , p 2 , - - - , p ;l be linearly independent plaintext vectors, and let Ci, C2, . . . , c„ be the 
corresponding ciphertext vectors in a Hill n-cipher. If 



is the n x n matrix with row vectors p[ , p 1 , , . . . , p' n and if 



is the n x n matrix with row vectors cf,c?, . . . , c^, then the sequence of elementary 
row operations that reduces C to I transforms P to (A _1 ) r . 


This theorem tells us that to find the transpose of the deciphering matrix A -1 , we 
must find a sequence of row operations that reduces C to I and then perform this same 
sequence of operations on P. The following example illustrates a simple algorithm for 
doing this. 

► EXAMPLE 8 UsingTheorem 10.14.5 

The following Hill 2-cipher is intercepted: 

IOSBTGXESPXHOPDE 

Decipher the message, given that it starts with the word DEAR. 
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Solution From Table 1 , the numerical equivalent of the known plaintext is 

DE AR 

4 5 1 18 

and the numerical equivalent of the corresponding ciphertext is 

IO SB 

9 15 19 2 


so the corresponding plaintext and ciphertext vectors are 


Pi = 

P2 = 


' 4 ' 

5 


** Ci 


T 

18 


c 2 


9 " 

15 . 

' 19 ' 

2 


We want to reduce 

' 9 15 ' 

19 2 . 

to I by elementary row operations and simultaneously apply these operations to 

'4 5 ' 

.1 18 . 

to obtain (A _1 ) r (the transpose of the deciphering matrix). This can be accomplished by 
adjoining P to the right of C and applying row operations to the resulting matrix [C | P] 
until the left side is reduced to I . The final matrix will then have the form [/ | (A _1 ) r ], 
The computations can be carried out as follows: 




9 

15 


4 

5 ' 

19 

2 


1 

18 . 

1 

45 


12 

15 ' 

19 

2 


1 

18 . 

1 

19 


12 

15 ' 

19 

2 


1 

18 . 

1 

19 



12 

0 

-359 



227 

1 

19 

12 

15 ' 

0 

5 


7 

19 . 

1 

19 


12 

15 

0 

1 

147 

399 

1 

19 

12 

15 ' 

0 

1 

17 

9 . 

1 

0 

— 

311 

-1 


Thus, 


15 ' 

-267 


0 1 

T 0 
0 1 


17 

1 O' 
17 9 


( A ’ 1 ) 7 = 


We formed the matrix [C | P]. 


We multiplied the first row by 9 1 = 3. 


We replaced 45 by its residue modulo 26. 


We added —19 times the first row to the second. 


We replaced the entries in the second row by their 
residues modulo 26. 


We multiplied the second row by 5 1 = 21. 


We replaced the entries in the second row by their 
residues modulo 26. 


We added —19 times the second row to the first. 


We replaced the entries in the first row by their 
residues modulo 26. 


1 o' 

17 9 
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so the deciphering matrix is 


A ’ 1 


'1 17' 

0 9 


To decipher the message, we first group the ciphertext into pairs and find the numerical 
equivalent of each letter: 


10 SB TG XE SP XH OP DE 
9 15 19 2 20 7 24 5 19 16 24 8 15 16 4 5 


Next, we multiply successive ciphertext vectors on the left by A 1 2 and find the alphabet 
equivalents of the resulting plaintext pairs: 


'1 

17' 

' 9' 


' 4 

D 

_0 

9_ 

_15_ 


_ 5_ 

E 

'1 

17' 

'19' 


' 1' 

A 

_0 

9_ 

_ 2 


_18_ 

R 

'1 

17' 

'20' 


' 9' 

I 

_0 

9_ 

_ 7_ 


_ 11 _ 

K 

'1 

17' 

'24' 


' 5' 

E 

_0 

9_ 

_ 5_ 


19 

S 

'1 

17' 

'19' 


' 5' 

E 

_0 

9_ 

16_ 


14 

N 

'1 

17' 

'24' 


' 4' 

D 

_0 

9_ 

_ 8_ 


_20_ 

T 

'1 

17' 

'15' 


' r 

A 

_0 

9_ 

16 


_ i4 _ 

N 

'1 

17' 

" 4' 


Ti' 

K 

0 

9 

5 


19 

S 


(mod 26) 


Finally, we construct the message from the plaintext pairs: 

DE AR IK ES EN DT AN KS 
DEAR IKE SEND TANKS ◄ 


Further 

Readings 


Readers interested in learning more about mathematical cryptography are referred to the following 
books, the first of which is elementary and the second more advanced. 

1. ABRAHAM SINKOV, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Asso- 
ciation of America, 2009). 

2. ALAN G. KONHEIM, Cryptography, a Primer (New York: Wiley-Interscience, 1981). 
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Exercise Set 10.14 

1. Obtain the Hill cipher of the message 
DARK NIGHT 


for each of the following enciphering matrices: 


(a) 


1 

2 


3' 

1 



3' 

2 


symbols would be available and all matrix arithmetic would 
be done modulo 29. Under what conditions would a matrix 
with entries in Z 29 be invertible modulo 29? 

9. Show that the modular equation 4x = 1 (mod 26) has no 
solution in Z 2 6 by successively substituting the values x = 
0,1,2,..., 25. 


2. In each part determine whether the matrix is invertible mod- 
ulo 26. If so, find its inverse modulo 26 and check your work 
by verifying that AA~ l = A -1 A = I (mod 26). 



'9 f 


'3 f 

r 

(a) A = 

1 2 

(b) A = 

.5 3_ 

(c) A = 


'2 f 


'3 F 

r 

(d) A = 

.1 7 . 

(e) A = 

.6 2 

(f) A = 


3. Decode the message 

SAKNOXA OJX 


given that it is a Hill cipher with enciphering matrix 

"4 1" 

.3 

4. A Hill 2-cipher is intercepted that starts with the pairs 

SL HK 

Find the deciphering and enciphering matrices, given that the 
plaintext is known to start with the word ARMY. 

5. Decode the following Hill 2-cipher if the last four plaintext 
letters are known to be ATOM. 

LNGIHG YB VRENJYQO 

6. Decode the following Hill 3-cipher if the first nine plaintext 
letters are IHAVECOME: 

HPAFQGGD UGDDHPGODYNOR 


7. All of the results of this section can be generalized to the case 
where the plaintext is a binary message; that is, it is a sequence 
of 0’s and 1 ’s. In this case we do all of our modular arithmetic 
using modulus 2 rather than modulus 26. Thus, for example, 
1 + 1 = 0 (mod 2). Suppose we want to encrypt the message 
110101111. Let us first break it into triplets to form the three 



1 


1 


1 


"l 

1 

0 " 

vectors 

1 

, 

0 

, 

1 

, and let us take 

0 

1 

1 


0 


1 


1 


1 

1 

1 


enciphering matrix. 

(a) Find the encoded message. 

(b) Find the inverse modulo 2 of the enciphering matrix, and 
verify that it decodes your encoded message. 

8. If, in addition to the standard alphabet, a period, comma, and 
question mark were allowed, then 29 plaintext and ciphertext 


10. (a) Let P and C be the matrices in Theorem 10.14.5. Show 

that P = C(A~ l ) T . 

(b) To prove Theorem 10.14.5, let E\, E 2 , ..., E n be the ele- 
mentary matrices that correspond to the row operations 
that reduce C to /, so 

E n ■ ■ ■ E 2 E\C = I 

Show that 

E n - ■ ■ E 2 E\P = (A~ l ) T 

from which it follows that the same sequence of row oper- 
ations that reduces C to I converts P to (A~*) r . 

11. (a) If A is the enciphering matrix of a Hill n -cipher, show that 

A~ l = (C~ l P) T (mod 26) 

where C and P are the matrices defined in Theorem 
10.14.5. 

(b) Instead of using Theorem 10.14.5 as in the text, find the 
deciphering matrix A -1 of Example 8 by using the result 
in part (a) and Equation (2) to compute C _1 . [Note: Al- 
though this method is practical for Hill 2-ciphers, Theo- 
rem 10. 14.5 is more efficient for Hill /i-ciphers with n > 2.] 

Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. Two integers that have no common factors (except 1) are 
said to be relatively prime. Given a positive integer n, let S„ = 
(ai, a 2 , a 2 , . . . , o m ), where a\ < a 2 < a 2 < ■ ■ ■ < a m , be the set 
of all positive integers less than n and relatively prime to n. For 
example, ifn = 9, then 

Sg = {fli, a 2 , a 2 , . . . , as) = {1,2, 4, 5, 7, 8) 

(a) Construct a table consisting of n and S„ for n = 2, 3, ... , 15, 
and then compute 

m ( m \ 

dk and I J (mod n) 

k=l \k= 1 / 
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in each case. Draw a conjecture for n > 15 and prove your 
conjecture to be true. [Hint: Use the fact that if a is relatively 
prime to n , then n — a is also relatively prime to n .] 


(b) Given a positive integer n and the set S n , let P„ be the m x m 
matrix 


a i 

a 2 

Q-2 

#3 

a 2 

1 I4 


Pn = 


dm— 1 

dm 

. a m 

a\ 


so that, for example, 



d?, 

d m — \ 

dm 

(24 

dm 

d\ 

d$ 

d\ 

d2 

a\ 

dm — 3 

dm— 2 

a 2 

dm— 2 

dm—l _ 


2 4 5 7 8“ 

4 5 7 8 1 

5 7 8 1 2 

7 8 12 4 

8 12 4 5 

1 2 4 5 7 


Use a computer to compute deU/ 5 ,,) and det(P„)(mod n) for 
n = 2,3 15, and then use these results to construct a con- 

jecture. 

(c) Use the results of part (a) to prove your conjecture to be true. 
[Hint: Add the first m — 1 rows of P n to its last row and then 


use Theorem 2.2.3.] What do these results imply about the 
inverse of P n (mod n)? 


T2. Given a positive integer n greater than 1, the number of posi- 
tive integers less than n and relatively prime to n is called the Euler 
phi function of n and is denoted by i p(n). For example, <p(6) = 2 
since only two positive integers ( 1 and 5) are less than 6 and have 
no common factor with 6. 

(a) Using a computer, for each value of w = 2, 3, .... 25 com- 
pute and print out all positive integers that are less than n and 
relatively prime to n. Then use these integers to determine 
the values of tp{n) for n = 2, 3, ... , 25. Can you discover a 
pattern in the results? 

(b) It can be shown that if {pj, p 2 , p 3 , . . . , p m ) are all the distinct 
prime factors of n, then 


tp(n) = n 






For example, since {2, 3} are the distinct prime factors of 12, 
we have 

,02) = u (.-!)(.-!). 4 

which agrees with the fact that ( 1 , 5,7, 11} are the only positive 
integers less than 12 and relatively prime to 12. Using a com- 
puter, print out all the prime factors of n for n = 2, 3, . . . , 25. 
Then compute ip(n) using the formula above and compare it 
to your results in part (a). 


10.15 Genetics 

In this section we investigate the propagation of an inherited trait in successive generations 
by computing powers of a matrix. 

Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 

Inheritance Traits In this section we examine the inheritance of traits in animals or plants. The inherited 
trait under consideration is assumed to be governed by a set of two genes, which we 
designate by A and a. Under autosomal inheritance each individual in the population of 
either gender possesses two of these genes, the possible pairings being designated AA, 
Act, and aa. This pair of genes is called the individual’s genotype, and it determines 
how the trait controlled by the genes is manifested in the individual. For example, 
in snapdragons a set of two genes determines the color of the flower. Genotype A A 
produces red flowers, genotype Aa produces pink flowers, and genotype aa produces 
white flowers. In humans, eye coloration is controlled through autosomal inheritance. 
Genotypes A A and Aa have brown eyes, and genotype aa has blue eyes. In this case we 
say that gene A dominates gene a, or that gene a is recessive to gene A, because genotype 
Aa has the same outward trait as genotype A A. 

In addition to autosomal inheritance we will also discuss X -linked inheritance. In 
this type of inheritance, the male of the species possesses only one of the two possible 
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genes (A or a), and the female possesses a pair of the two genes (AA, Aa, or aa). In 
humans, color blindness, hereditary baldness, hemophilia, and muscular dystrophy, to 
name a few, are traits controlled by X-linked inheritance. 

Below we explain the manner in which the genes of the parents are passed on to 
their offspring for the two types of inheritance. We construct matrix models that give 
the probable genotypes of the offspring in terms of the genotypes of the parents, and 
we use these matrix models to follow the genotype distribution of a population through 
successive generations. 

Autosomal Inheritance In autosomal inheritance an individual inherits one gene from each of its parents’ pairs 

of genes to form its own particular pair. As far as we know, it is a matter of chance 
which of the two genes a parent passes on to the offspring. Thus, if one parent is of 
genotype Aa, it is equally likely that the offspring will inherit the A gene or the a gene 
from that parent. If one parent is of genotype aa and the other parent is of genotype 
Aa, the offspring will always receive an a gene from the aa parent and will receive either 
an A gene or an a gene, with equal probability, from the Aa parent. Consequently, each 
of the offspring has equal probability of being genotype aa or Aa. In Table 1 we list the 
probabilities of the possible genotypes of the offspring for all possible combinations of 
the genotypes of the parents. 


Table 1 


Genotype 
of Offspring 

Genotypes of Parents 

A A- A A 

AA-Aa 

AA-aa 

Aa-Aa 

Aa-aa 

aa-aa 

AA 

1 

1 

2 

0 

1 

4 

0 

0 

Aa 

0 

1 

2 

1 

1 

2 

1 

2 

0 

aa 

0 

0 

0 

1 

4 

1 

2 

1 


► EXAMPLE 1 Distribution of Genotypes in a Population 

Suppose that a farmer has a large population of plants consisting of some distribution 
of all three possible genotypes AA, Aa, and aa. The farmer desires to undertake a 
breeding program in which each plant in the population is always fertilized with a plant 
of genotype AA and is then replaced by one of its offspring. We want to derive an 
expression for the distribution of the three possible genotypes in the population after 
any number of generations. 

For n — 0, 1, 2, .... let us set 

a„ = fraction of plants of genotype A A in n th generation 
b„ = fraction of plants of genotype Aa in nth generation 
c„ — fraction of plants of genotype aa in nth generation 
Thus a 0 ,b 0 , and Co specify the initial distribution of the genotypes. We also have that 
a„ + b n + c„ = 1 for n = 0, 1, 2, . . . 

From Table 1 we can determine the genotype distribution of each generation from the 
genotype distribution of the preceding generation by the following equations: 

a n — a n _\ 4“ 2 ^«— l 

bn = c n — i T ^b t i— i n = 1,2,... (1) 

c„ = 0 
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For example, the first of these three equations states that all the offspring of a plant of 
genotype A A will be of genotype A A under this breeding program and that half of the 
offspring of a plant of genotype Aa will be of genotype A A. 

Equations (1) can be written in matrix notation as 

x w = Mx ( "- 1) , n = 1,2,... (2) 

where 


a n 


fl-n — 1 


‘I i 0" 

bn 

, x (,,_1) = 

bn — 1 

, and M = 

0 i 1 

Cn 


Cn— 1 


1 

o 

o 

o | 


Note that the three columns of the matrix M are the same as the first three columns of 
Table 1. 

From Equation (2) it follows that 

x (n) = Mx ( " _1) = M 2 x ( "- 2) = • • • = M n x (0) (3) 

Consequently, if we can find an explicit expression for M n , we can use (3) to obtain an 
explicit expression for x (n> . To find an explicit expression for M n , we first diagonalize 
M. That is, we find an invertible matrix P and a diagonal matrix D such that 

M = PDP~ l (4) 


With such a diagonalization, we then have (see Exercise 1 ) 


where 



M n 

= PD"P 

-1 

for n = 

1,2, . 



^1 

0 

0 ■■■ 

0" 

n 


0 

0 • 

• 0“ 

0 

x 2 

0 ■■■ 

0 

— 

0 

A? 

0 • 

• 0 

_0 

0 

0 ••• 

^-k_ 


_0 

0 

0 ■ 

l n 


D" = 


The diagonalization of M is accomplished by finding its eigenvalues and corresponding 
eigenvectors. These are as follows (verify): 


Eigenvalues: = 

1, 

Xi — 

1 

2’ 

o 

II 

m 


V 


r 


r 

Corresponding eigenvectors: v 3 = 

0 

, v 2 = 

-i 

, v 3 = 

-2 


0 


0 


1 


Thus, in Equation (4) we have 



X\ 

0 

0 


D = 

0 

X-2 

0 

= 


0 

0 

a. 3 



1 0 0 
0 i 0 


and 


Therefore, 


An) - 


p — [Vl | v 2 | v 3 ] = 


0 0 0 

1 f 

-1 -2 


0 


1 


PD n p-'x m> = 


1 0 0 
0 (!)" 0 
0 0 0 


1 1 

-1 -2 

0 1 





bo 


c°_ 
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a„ 


'1 1 - ar 

1-Gr 1 ' 



bn 

= 

0 Gf 

Gr 1 


bo 

C n 


0 0 

0 


c°_ 


/ / i \ n , /i \ n — 1 

flo + bo + Co — j ^0 — ( 2 / c ° 

(D'^+GrV 

0 


Using the fact that a 0 + b 0 + c 0 = 1, we thus have 

«„ = !-(!)%, -(If-Vo 

b n =(l) n b 0 +ay- l c 0 n = 1,2,... (5) 

Cn — 0 


These are explicit formulas for the fractions of the three genotypes in the nth generation 
of plants in terms of the initial genotype fractions. 

Because ( 3 ) tends to zero as n approaches infinity, it follows from these equations 
that 


d n ^ 1 

b n 0 
C n = 0 


as n approaches infinity. That is, in the limit all plants in the population will be genotype 
AA. 


► EXAMPLE 2 Modifying Example 1 

We can modify Example 1 so that instead of each plant being fertilized with one of 
genotype A A, each plant is fertilized with a plant of its own genotype. Using the same 
notation as in Example 1 , we then find 

x (n) = M"x (0) 


where 


M = 


The columns of this new matrix M are the same as the columns of Table 1 corresponding 
to parents with genotypes AA-AA, Aa-Aa, and aa-aa. 

The eigenvalues of M are (verify) 

^ = 1, A 2 =l, = l 


The eigenvalue *1 = 1 has multiplicity two and its corresponding eigenspace is two- 
dimensional. Picking two linearly independent eigenvectors vi and v 2 in that eigenspace, 
and a single eigenvector v 3 for the simple eigenvalue *3 = we have (verify) 


T 


"o’ 


f 

0 

, V 2 = 

0 

, v 3 = 

-2 

0 


1 


1 


Vi = 
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Autosomal Recessive 
Diseases 


The calculations for x w are then 

x (n) = M"x (0) = PD n P~ l x (0) 


0 1 

0 -2 

1 1 


1 0 
0 1 


0 

0 

iy> 


L° 0 (D 


- G)" +1 o ' 

(\y o 

-ar i 


Thus, 


On = CIO + - (l)" +1 ] bo 

bn = (i)"6o 

Cn = CO + [j - (j)" + ’] b 0 


n = 1,2, 


( 6 ) 


In the limit, as n tends to infinity, (i)" — >■ 0 and (^)" + * 

o n — > Oq + \bo 


0, so 


b n — > 0 

Cn —*■ Co + \bo 

Thus, fertilization of each plant with one of its own genotype produces a population that 
in the limit contains only genotypes AA and oa. 


There are many genetic diseases governed by autosomal inheritance in which a normal 
gene A dominates an abnormal gene a. Genotype A A is a normal individual; genotype 
Aa is a carrier of the disease but is not afflicted with the disease; and genotype aa is 
afflicted with the disease. In humans such genetic diseases are often associated with a 
particular racial group — for instance, cystic fibrosis (predominant among Caucasians), 
sickle-cell anemia (predominant among people of African origin), Cooley’s anemia (pre- 
dominant among people of Mediterranean origin), and Tay-Sachs disease (predominant 
among Eastern European Jews). 

Suppose that an animal breeder has a population of animals that carries an autosomal 
recessive disease. Suppose further that those animals afflicted with the disease do not 
survive to maturity. One possible way to control such a disease is for the breeder to 
always mate a female, regardless of her genotype, with a normal male. In this way, all 
future offspring will either have a normal father and a normal mother ( AA-AA matings) 
or a normal father and a carrier mother ( AA-Aa matings). There can be no AA-aa 
matings since animals of genotype aa do not survive to maturity. Under this type of 
mating program no future offspring will be afflicted with the disease, although there will 
still be carriers in future generations. Let us now determine the fraction of carriers in 
future generations. We set 


where 

a„ = fraction of population of genotype AA in nth generation 

b„ = fraction of population of genotype Aa (carriers) in /7 th generation 
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X-Linked Inheritance 


Because each offspring has at least one normal parent, we may consider the controlled 
mating program as one of continual mating with genotype A A, as in Example 1. Thus, 
the transition of genotype distributions from one generation to the next is governed by 
the equation 

x (n) = Mx ( " _1) , 77 = 1,2,... 

where 


Because we know the initial distribution x (0) , the distribution of genotypes in the /7 th 
generation is thus given by 

X W = M"X (0) , 77=1,2,... 


The diagonalization of M is easily carried out (see Exercise 4) and leads to 


x ( '0 = PD"P- I x m = 

"1 

_o 


r* 1 - (D”i 



— i 

K 

' — 1 1 CN 

O ( 

b 0 . 


Because <2 0 + fro = 1 > we have 

On - 1 - (!)"^0 

b n = G Tbo 

Thus, as 77 tends to infinity, we have 


1 

0 

% 

r 


0 

Gr 

_0 

-i 

bo. 


flo + bo — G) ^0 

. G )"*»” 


77 = 1,2,... (7) 


Q n 


1 

0 


so in the limit there will be no carriers in the population. 

From (7) we see that 

b n =\b n -\, 77=1,2,... (8) 

That is, the fraction of carriers in each generation is one-half the fraction of carriers in 
the preceding generation. It would be of interest also to investigate the propagation of 
carriers under random mating, when two animals mate without regard to their genotypes. 
Unfortunately, such random mating leads to nonlinear equations, and the techniques of 
this section are not applicable. Flowever, by other techniques it can be shown that under 
random mating, Equation (8) is replaced by 


b n = 


bn — i 


1 + nb, ,_1 


77 = 1,2, 


(9) 


As a numerical example, suppose that the breeder starts with a population in which 
10% of the animals are carriers. Under the controlled-mating program governed by 
Equation (8), the percentage of carriers can be reduced to 5% in one generation. But 
under random mating, Equation (9) predicts that 9.5% of the population will be carriers 
after one generation (b n = .095 if b n _\ = .10). In addition, under controlled mating no 
offspring will ever be afflicted with the disease, but with random mating it can be shown 
that about 1 in 400 offspring will be born with the disease when 10% of the population 
are carriers. 


As mentioned in the introduction, in X-linked inheritance the male possesses one gene 
(A or a) and the female possesses two genes (AA, Aa, or aa). The term X-linked is used 
because such genes are found on the X-chromosome, of which the male has one and the 
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female has two. The inheritance of such genes is as follows: A male offspring receives 
one of his mother’s two genes with equal probability, and a female offspring receives 
the one gene of her father and one of her mother’s two genes with equal probability. 
Readers familiar with basic probability can verify that this type of inheritance leads to 
the genotype probabilities in Table 2. 


Table 2 





Genotypes of Parents (Father, Mother) 




(A, AA) 

(A, Aa) 

(A, aa) 

(a, A A) 

(a, Aa) 

(a, aa) 


a> 

A 

1 

1 

2 

0 

1 

1 

2 

0 

W> 


a 

0 

1 

2 

1 

0 

1 

2 

1 

*M 

a. 

C/5 

52 


AA 

1 

1 

2 

0 

0 

0 

0 

u 

Female 

Aa 

0 

1 

2 

1 

1 

1 

2 

0 



aa 

0 

0 

0 

0 

1 

2 

1 


We will discuss a program of inbreeding in connection with X-linked inheritance. We 
begin with a male and female; select two of their offspring at random, one of each gender, 
and mate them; select two of the resulting offspring and mate them; and so forth. Such 
inbreeding is commonly performed with animals. (Among humans, such brother-sister 
marriages were used by the rulers of ancient Egypt to keep the royal line pure.) 

The original male-female pair can be one of the six types, corresponding to the six 
columns of Table 2: 


(A, A A), (A,Aa), ( A,aa ), ( a, AA ), ( a,Aa ), (a, aa) 

The sibling pairs mated in each successive generation have certain probabilities of being 
one of these six types. To compute these probabilities, for n = 0, 1, 2, . . . , let us set 

a n — probability sibling-pair mated in nth generation is type (A, A A) 

b n — probability sibling-pair mated in nth generation is type (A, Aa) 

c„ = probability sibling-pair mated in nth generation is type (A, aa) 

d n — probability sibling-pair mated in nth generation is type (a, A A) 

e n — probability sibling-pair mated in nth generation is type ( a , Aa) 

f n = probability sibling-pair mated in nth generation is type ( a , aa) 

With these probabilities we form a column vector 




a n 

b n 

Cn 

d n 

.fn. 


n = 0, 1,2, 


From Table 2 it follows that 


x (n) = Mx ( " _1) , 


n = 1, 2, . . . 


( 10 ) 
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where 



(A, AA) 

1 

(. A,Aa ) 

l 

4 

(A, aa l 

0 

(a, A A) 

0 

(a, Aa) 

0 

(a, aa) 

0 ' 

(A, AA) 


0 

1 

4 

0 

1 

l 

4 

0 

(A, Aa) 


0 

0 

0 

0 

1 

A 

0 

(A, aa) 

M = 

0 

1 

4 

0 

0 

0 

0 

(a, AA) 


0 

1 

4 

1 

0 

1 

4 

0 

(a, Aa) 


0 

0 

0 

0 

1 

4 

1 

(a, aa) 

For example, suppose that in the (;? — 

1 )-st generation, the sibling pair mated is type 


(A, Aa). Then their male offspring will be genotype A or a with equal probability, and 
their female offspring will be genotype A A or Aa with equal probability. Because one 
of the male offspring and one of the female offspring are chosen at random for mating, 
the next sibling pair will be one of type (A, AA), (A, Aa), (a, AA), or (a, Aa) with 
equal probability. Thus, the second column of M contains in each of the four rows 
corresponding to these four sibling pairs. (See Exercise 9 for the remaining columns.) 

As in our previous examples, it follows from (10) that 

x (n) = M"x (0) , n = 1,2,... (11) 


After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be 

A-i = 1, 


= 1, 


^•3 = 5 , 

A.4 

= -i, A. 

1 

5 = 4 

(1 + V5), 

■A -6 


T 


~ 0 ~ 


‘-l‘ 


f 


0 


0 


2 


-6 


0 


0 


-1 


-3 

= 

0 

, V 2 = 

0 

, v 3 = 

1 

, V 4 = 

3 


0 


0 


-2 


6 


0 


1 


1 


-1 


V5) 


V5 = 


1 

T 

1 

§ 

w 1 


i(-3 + V5) 

1 


1 

±(— l + x/5) 


j(-l- V5) 

±(— l + x/5) 

, v 6 = 

j(-l- V5) 

1 


1 

i(-3- V5) 


i(-3 + V5) 


The diagonalization of M then leads to 
x (n) = PD"P 


‘x® 


n = 1,2,... 


(12) 


where 


1 

0 

-1 

1 

§ 

1 

CO 

i(-3 + V5) 

0 

0 

2 

-6 

1 

1 

0 

0 

-1 

-3 

i(-l + V5) 

i(-l- V5) 

0 

0 

1 

3 

i(-l + V5) 

i(-l- V5) 

0 

0 

-2 

6 

1 

1 

0 

1 

1 

-1 

i(-3- V5) 

i(-3 + V5) 


P = 
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1 

0 0 

0 

0 


0 


0 

1 0 

0 

0 


0 


0 

0 G) n 

0 

0 


0 


0 

0 0 (- 

-ir 

0 


0 


0 

0 0 

0 [|d+V5)]" 


0 


0 

0 0 

0 

0 

[Id 

-x/5) 

1 

2 

1 

2 

1 


o' 

3 

3 

3 

3 


0 

1 

2 

1 

2 


1 

3 

3 

3 

3 


0 

1 

8 

1 

4 

1 

4 

1 

8 


0 

0 

1 

24 

1 

12 

1 

12 

1 

24 


0 

0 

^(5 + V5) 


^ £ 

(5 + x/5 ) 

0 

0 

To( 5-V5) 

-1V5 


(5- V5) 

0 



We will not write out the matrix product in (12), as it is rather unwieldy. However, if a 
specific vector x (0) is given, the calculation for x (n) is not too cumbersome (see Exer- 
cise 6). 

Because the absolute values of the last four diagonal entries of D are less than 1, we 
see that as n tends to infinity, 


D" -* 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


And so, from Equation (12), 


» 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


P~ l x (0) 


Performing the matrix multiplication on the right, we obtain (verify) 


r (n) 


do + 3^0 + 3C0 + 3 do + ^eo 
0 
0 
0 
0 

fo + \bo + =co + + \ eo 


(13) 
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That is, in the limit all sibling pairs will be either type (A, A A) or type (a, aa). For 
example, if the initial parents are type (A, Aa) (that is, b 0 = 1 and a 0 — Co — do — eg = 
fo = 0), then as n tends to infinity, 

~ 2 ~ 

3 

0 

x (n) -> ° 

0 

0 

1 

_ 3 _ 

Thus, in the limit there is probability | that the sibling pairs will be (A, AA), and pro- 
bability | that they will be ( a , aa). 


Exercise Set 10.15 

1. Show that if M = PDP~ 1 , then M n = PD"P ~ 1 for n 

1. In Example 1 suppose that the plants are always fertilized with 
a plant of genotype Aa rather than one of genotype A A. Derive 
formulas for the fractions of the plants of genotypes A A, Aa, 
and aa in the nth generation. Also, find the limiting genotype 
distribution as n tends to infinity. 

3. In Example 1 suppose that the initial plants are fertilized with 
genotype A A, the first generation is fertilized with genotype 
Aa, the second generation is fertilized with genotype AA, and 
this alternating pattern of fertilization is kept up. Find formu- 
las for the fractions of the plants of genotypes AA, Aa, and aa 
in the nth generation. 

4. In the section on autosomal recessive diseases, find the eigenval- 
ues and eigenvectors of the matrix M and verify Equation (7). 

5. Suppose that a breeder has an animal population in which 25% 
of the population are carriers of an autosomal recessive disease. 
If the breeder allows the animals to mate irrespective of their 
genotype, use Equation (9) to calculate the number of genera- 
tions required for the percentage of carriers to fall from 25% to 
10%. If the breeder instead implements the controlled-mating 
program determined by Equation (8), what will the percentage 
of carriers be after the same number of generations? 


6. 


In the section on X-linked inheritance, suppose that the ini- 
tial parents are equally likely to be of any of the six possible 
genotype parents; that is. 


1 1 ' 
6 
l 
6 
l 
6 
l 
6 
l 
6 
l 

_ 6 _ 


Using Equation (12), calculate x ln) and also calculate the limit 
of x (,l) as u tends to infinity. 


7. From ( 1 3) show that under X-linked inheritance with inbreed- 
ing, the probability that the limiting sibling pairs will be of type 


(A, A A) is the same as the proportion of A genes in the initial 
population. 

8. In X-linked inheritance suppose that none of the females of 
genotype Aa survive to maturity. Under inbreeding the possi- 
ble sibling pairs are then 

(A, AA), (A,aa), {a, A A), and ( a,aa ) 

Find the transition matrix that describes how the genotype dis- 
tribution changes in one generation. 

9. Derive the matrix M in Equation (10) from Table 2. 


Working with Technology 


The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. (a) Use a computer to verify that the eigenvalues and eigen- 
vectors of 

\ 0 0 


M 


i 


i 0 
0 0 


1 
0 

0 0 


i 1 
0 0 


as given in the text are correct. 

(b) Starting with x <n) = Mx ( " _l) and the assumption that 
lirn x w = x 


exists, we must have 

lim x (,,) = M lim x (n_1) or x = Mx 

n— >oo n— >oo 
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This suggests that x can be solved directly using the equa- 
tion (M — I)x = 0. Use a computer to solve the equation 
x = Mx, where 


from Equation (12) and 




_ 1 

0 

0 

0 

0 

(T 

a ~ 


0 

1 

0 

0 

0 

0 

b 

C 

lim D" = 

0 

0 

0 

0 

0 

0 



0 

0 

0 

0 

0 

0 

d 


0 

0 

0 

0 

0 

0 

i 

1 


.0 

0 

0 

0 

0 

0_ 


and a+b + c + d + e + f= 1; compare your results to 
Equation (13). Explain why the solution to ( M — I)x = 0 
along with a+b+c+d+e+f= lis not specific enough 
to determine lim,,_>„ x (n) . 

T2. (a) Given 



'1 

0 

-1 

1 

T 

1 

§ 

J (— 3 + x / 5 )‘ 


0 

0 

2 

-6 

1 

1 

p 

0 

0 

-1 

-3 

i (- l + V 5 ) 

i(-l - V 5 ) 

r — 

0 

0 

1 

3 

\(-l + V 5) 



0 

0 

-2 

6 

1 

1 


.0 

1 

1 

-1 

1 

1 

§ 

f (-3 + x / 5 ). 


use a computer to show that 


lim M" 


0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


(b) Use a computer to calculate M" for n — 10, 20, 30, 40, 50, 60, 
70, and then compare your results to the limit in part (a). 


10.16 Age-Specific Population Growth 

In this section we investigate, using the Leslie matrix model, the growth over time of a 
female population that is divided into age classes. We then determine the limiting age 
distribution and growth rate of the population. 


Eigenvalues and Eigenvectors 
Diagonalization of a Matrix 
Intuitive Understanding of Limits 


One of the most common models of population growth used by demographers is the 
so-called Leslie model developed in the 1940s. This model describes the growth of the 
female portion of a human or animal population. In this model the females are divided 
into age classes of equal duration. To be specific, suppose that the maximum age attained 
by any female in the population is L years (or some other time unit) and we divide the 
population into n age classes. Then each class is L/n years in duration. We label the age 
classes according to Table 1 . 


Table 1 


Age Class 

Age Interval 

1 

[0, L/n) 

2 

[L/n, 2 L/n) 

3 

[2L/n, 2L/n) 

n— 1 

[{n - 2)L/n, {n - \)L/n) 

n 

[(« - 1 )L/n, L\ 
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Suppose that we know the number of females in each of the n classes at time t — 0. In 
particular, let there be xj 0) females in the first class, x? 01 females in the second class, and 
so forth. With these n numbers we form a column vector: 


x (0) = 


,(°) 


r (0) 


We call this vector the initial age distribution vector. 

As time progresses, the number of females within each of the n classes changes 
because of three biological processes: birth, death, and aging. By describing these three 
processes quantitatively, we will see how to project the initial age distribution vector into 
the future. 

The easiest way to study the aging process is to observe the population at discrete 

times — say, to, t\, t 2 , . . . , t k , The Leslie model requires that the duration between 

any two successive observation times be the same as the duration of the age intervals. 
Therefore, we set 


to = 0 
t\ = L/n 
ti — 2 L/n 

t k — kL/n 


With this assumption, all females in the (i + l)-st class at time t k+l were in the ith class 
at time t k . 

The birth and death processes between two successive observation times can be de- 
scribed by means of the following demographic parameters: 


a i 

(7=1,2,..., n) 

The average number of daughters 
born to each female during the 
time she is in the ith age class 

hi 

(i= 1,2, .. ,,/i-l) 

The fraction of females in the ith 
age class that can be expected to 
survive and pass into the (i +l)-st 
age class 


By their definitions, we have that 

(i) a i > 0 for i — 1,2 n 

(ii) 0 < bj < 1 for i = 1 , 2, . . . , n — 1 

Note that we do not allow any b, to equal zero, because then no females would survive 
beyond the i th age class. We also assume that at least one a,- is positive so that some 
births occur. Any age class for which the corresponding value of a, is positive is called 
a fertile age class. 
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We next define the age distribution vector x (i:) at time tk by 


„(*) - 


Jk) 


Ak) 


where 1 is the number of females in the z'th age class at time t k. Now, at time tk, the 
females in the first age class are just those daughters born between times tk- \ and r* . 
Thus, we can write 


number of 
females 
in class 1 
at time tk 


number of 


number of 


number of 

daughters 


daughters 


daughters 

born to 


born to 


bom to 

females in 

■ + ' 

females in 

■+•••+• 

females in 

class 1 


class 2 


class z? 

between times 


between times 


between times 

t k - i and t k 


tk-\ and tk 


tk-\ and tk 


or, mathematically, 


x 


(k) 

l 


r (*-b , _ Y (k- 1) , , (*-l) 

Cl lX[ aiX 2 T • • • T Cl n X n 


( 1 ) 


The females in the (i + l)-st age class (i = 1,2 — 1) at time tk are those females 
in the z'th class at time tk - 1 who are still alive at time tk. Thus, 


number of 
females in 


fraction of 
females in 

class i 


number of 1 
females in 

class i + 1 


who survive 

’ ' 

class i I 

at time tk 


and pass into 
class i + 1 


at time tk- \ J 


or, mathematically, 

x,® = bixf 1} , i = 1, 2, . . . , n — 1 
Using matrix notation, we can write Equations ( 1 ) and (2) as 


Ak) 


Ak) 


a \ 

b\ 

0 


a 2 a 2 

0 0 
bi 0 


Clft — l 
0 
0 


Cln 

0 

0 


or more compactly as 


0 0 0 


x<*> = Lx^-» 


where L is the Leslie matrix 


L = 


Cl\ 0-2 Cl2 

b\ 0 0 

0 b 2 0 

0 0 0 


b „- 1 0 

k = 1,2, ... 

Clti — 1 Cl n 

0 0 

0 0 

■■ b n - 1 0_ 


Ak-iy 

n 

Ak- 1) 
c 2 

r »-l) 




( 2 ) 


( 3 ) 


( 4 ) 
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Limiting Behavior 


From Equation (3) it follows that 

x® = Lx® 

x® = Lx® = L 2 x (0) 

x® = Lx® = L 3 x® (5) 

x® = Lx (k_1) = L*x (0) 

Thus, if we know the initial age distribution x® and the Leslie matrix L , we can determine 
the female age distribution at any later time. 


I EXAMPLE 1 Female Age Distribution for Animals 

Suppose that the oldest age attained by the females in a certain animal population is 1 5 
years and we divide the population into three age classes with equal durations of five 
years. Let the Leslie matrix for this population be 


L = 


'0 4 

2 0 


0 


? 0. 


If there are initially 1000 females in each of the three age classes, then from Equation (3) 
we have 


x® = 


.(2) _ 


1,000 





1,000 





1,000 






"0 4 3' 


"1,000" 


"7,000" 

Lx (0) = 

i 0 0 


1,000 

= 

500 


X | o_ 


1,000 


250 


"0 4 3' 


"7,000" 


"2,750" 

Lx (1) = 

\ 0 0 


500 

= 

3,500 


1 

o 

4^.1 •— 

o 

1 


250 


125 


"0 4 3" 


"2,750” 


"14,375 

Lx® = 

| 0 0 


3,500 

= 

1,375 


1 

o 

-F=»l ■ 

0 

1 


125 


875 


Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females 
between 5 and 10 years of age, and 875 females between 10 and 15 years of age. 


Although Equation (5) gives the age distribution of the population at any time, it does 
not immediately give a general picture of the dynamics of the growth process. For this we 
need to investigate the eigenvalues and eigenvectors of the Leslie matrix. The eigenvalues 
of L are the roots of its characteristic polynomial. As we ask you to verify in Exercise 2, 
this characteristic polynomial is 

p(X) = \XI-L\ 

= X n — ci\X n ~ l — o.2b\X n ~ 2 — a^b^X " _3 — ■ ■ • — a„b\b2 ■ ■ ■ 7>„_i 


To analyze the roots of this polynomial, it will be convenient to introduce the function 


q(X) = 


G\ 

X 


+ 


a 2 b\ 

XX 


+ 


a2,bib 2 

X 3 


a nb\b 2 ■ ■ ■ b n _\ 

X" 


( 6 ) 
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Using this function, the characteristic equation p{X) — 0 can be written (verify) 

q(X) = 1 forA/0 (7) 



Because all the a ,• and b t are nonnegative, we see that q (A) is monotonically decreasing for 
X greater than zero. Furthermore, q (A) has a vertical asymptote at X = 0 and approaches 
zero as X — ► oo. Consequently, as Figure 10. 16. 1 indicates, there is a unique X, say X = Ai , 
such that q(X\) = 1. That is, the matrix L has a unique positive eigenvalue. It can also 
be shown (see Exercise 3) that A| has multiplicity 1; that is, A| is not a repeated root of 
the characteristic equation. Although we omit the computational details, you can verify 
that an eigenvector corresponding to Ai is 


1 

bi/X\ 

b\b 2 /X\ 

b x b 2 b 2 /X\ 


( 8 ) 


b\b 2 ■ ■ ■ b n -\/X'{- 1 _ 

Because Ai has multiplicity 1, its corresponding eigenspace has dimension 1 (Exercise 3), 
and so any eigenvector corresponding to it is some multiple of xi. We can summarize 
these results in the following theorem. 


Existence of a Positive Eigenvalue 

A Leslie matrix L has a unique positive eigenvalue X\. This eigenvalue has multiplicity 
1 and an eigenvector xi all of whose entries are positive. 


We will now show that the long-term behavior of the age distribution of the population 
is determined by the positive eigenvalue Ai and its eigenvector xi. 

In Exercise 9 we ask you to prove the following result. 


Eigenvalues of a Leslie Matrix 

IfX i is the unique positive eigenvalue of a Leslie matrix L, and X i is any other real or 
complex eigenvalue of L , then |A*| < Ai. 


For our purposes the conclusion in Theorem 10.16.2 is not strong enough; we need Ai to 
satisfy | Ai| < 7. | . In this case A! would be called the dominant eigenvalue of L. However, 
as the following example shows, not all Leslie matrices satisfy this condition. 


► EXAMPLE 2 Leslie Matrix with No Dominant Eigenvalue 


Let 



Then the characteristic polynomial of L is 


0 

0 

l 

3 


6 ' 

0 

0 


p{ A) = \XI — L| = A 3 — 1 
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The eigenvalues of L are thus the solutions of X 3 = 1 — namely, 


X = 1, 


1 \/3. 

~ H — —i 


1 V3. 


All three eigenvalues have absolute value 1, so the unique positive eigenvalue X\ = 1 is 
not dominant. Note that this matrix has the property that L 3 = I . This means that for 
any choice of the initial age distribution x <0) , we have 


# = x (3 > = x< 6) = 


= x< 3 *> = 


The age distribution vector thus oscillates with a period of three time units. Such oscil- 
lations (or population waves, as they are called) could not occur if X\ were dominant, as 
we will see below. 


It is beyond the scope of this book to discuss necessary and sufficient conditions for 
A.i to be a dominant eigenvalue. However, we will state the following sufficient condition 
without proof. 


Dominant Eigenvalue 

If two successive entries a, and fl I+ i in the first row of a Leslie matrix L are nonzero, 
then the positive eigenvalue of L is dominant. 


Thus, if the female population has two successive fertile age classes, then its Leslie matrix 
has a dominant eigenvalue. This is always the case for realistic populations if the duration 
of the age classes is sufficiently small. Note that in Example 2 there is only one fertile age 
class (the third), so the condition of Theorem 10.16.3 is not satisfied. In what follows, 
we always assume that the condition of Theorem 10.16.3 is satisfied. 

Let us assume that L is diagonalizable. This is not really necessary for the con- 
clusions we will draw, but it does simplify the arguments. In this case, L has n eigen- 
values, A.i, X 2 , . . . , X n , not necessarily distinct, and n linearly independent eigenvectors, 
xi, x 2 , . . . , x„, corresponding to them. In this listing we place the dominant eigenvalue 
Li first. We construct a matrix P whose columns are the eigenvectors of L: 


p = [Xl | X 2 I x 3 I • • • I x„] 


The diagonalization of L is then given by the equation 


L = P 


'Xi 0 0 

0 X 2 0 
0 0 0 


L k = P 


X\ 0 0 

0 X k 0 


0 0 0 


p-l 


From this it follows that 
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for k = 1,2,.... For any initial age distribution vector x (0) , we then have 


L k x (0) = P 


k k 


0 0 


0 0 


0 0 0 


p-l x W 


for k = 1, 2, Dividing both sides of this equation by k\ and using the fact that 

x (k) = L k x^°\ we have 


1 0 0 



0 0 0 


0 


0 


p-i x W 



(9) 


Because Li is the dominant eigenvalue, we have |L ; /X\ \ < 1 for i = 2, 3, . . . , n. It fol- 
lows that 

(ki /X\) k —*■ 0 as k —*■ oo for i = 2, 3, . . . , n 
Using this fact, we can take the limit of both sides of (9) to obtain 


lim 

OO 



0 

0 

0 


0 

0 

0 


0 

0 


p-l x W 


0 


( 10 ) 


Let us denote the first entry of the column vector P~ l x {0) by the constant c. As we ask 
you to show in Exercise 4, the right side of (10) can be written as cxi, where c is a positive 
constant that depends only on the initial age distribution vector x (0) . Thus, (10) becomes 


lim 


(*) l _ 


X } = CXi 


Equation (11) gives us the approximation 


x® ~ cLjXi 


for large values of k. From (12) we also have 


„(*-!) 


~ ck\ 'xi 


Comparing Equations (12) and (13), we see that 




(ID 


( 12 ) 


(13) 


(14) 


for large values of k. This means that for large values of time, each age distribution vector 
is a scalar multiple of the preceding age distribution vector, the scalar being the positive 
eigenvalue of the Leslie matrix. Consequently, the proportion of females in each of the 
age classes becomes constant. As we will see in the following example, these limiting 
proportions can be determined from the eigenvector xi . 
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► EXAMPLE 3 Example 1 Revisited 

The Leslie matrix in Example 1 was 


0 


L = 


1 

2 


0 


4 

0 

l 

4 


3' 

0 

0 


Its characteristic polynomial is p(X) = A 3 — 2k — |, and you can verify that the positive 
eigenvalue is Li = From (8) the corresponding eigenvector xj is 



1 


1 

^ ~KN|cn 

1 


’ i 

Xl = 

b\ Ai 
_M 2 A?_ 


«)■ j 


3 

1 

_ 18 _ 


From (14) we have 


Jk) 


3 X (*- 

2 X 


1) 


for large values of k. Hence, every five years the number of females in each of the three 
classes will increase by about 50%, as will the total number of females in the population. 
From (12) we have 

n 1 






Consequently, eventually the females will be distributed among the three age classes in 
1 • 1 This corresponds to a distribution of 72% of the females in the first 


the ratios 1:^:4 


age class, 24% of the females in the second age class, and 4% of the females in the third 
age class. 


I EXAMPLE 4 Female Age Distribution for Humans 

In this example we use birth and death parameters from the year 1965 for Canadian 
females. Because few women over 50 years of age bear children, we restrict ourselves 
to the portion of the female population between 0 and 50 years of age. The data are 
for 5-year age classes, so there are a total of 10 age classes. Rather than writing out the 
10 x 10 Leslie matrix in full, we list the birth and death parameters as follows: 


Age Interval 

“i 

hi 

[0, 5) 

0.00000 

0.99651 

[5, 10) 

0.00024 

0.99820 

[10,15) 

0.05861 

0.99802 

[15,20) 

0.28608 

0.99729 

[20, 25) 

0.44791 

0.99694 

[25, 30) 

0.36399 

0.99621 

[30, 35) 

0.22259 

0.99460 

[35, 40) 

0.10457 

0.99184 

[40, 45) 

0.02826 

0.98700 

[45, 50) 

0.00240 

— 
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Using numerical techniques, we can approximate the positive eigenvalue and correspond- 
ing eigenvector by 


= 1.07622 and xj 


1.00000 

0.92594 

0.85881 

0.79641 

0.73800 

0.68364 

0.63281 

0.58482 

0.53897 

0.49429 


Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually 
every 5 years their numbers would increase by 7.622%. From the eigenvector xi, we see 
that, in the limit, for every 100,000 females between 0 and 5 years of age, there will be 
92,594 females between 5 and 10 years of age, 85,881 females between 10 and 15 years 
of age, and so forth. 


Let us look again at Equation (12), which gives the age distribution vector of the 
population for large times: 

x (<0 ~ ckjXi (15) 

Three cases arise according to the value of the positive eigenvalue 7_i : 

(i) The population is eventually increasing if 7.! > 1. 

(ii) The population is eventually decreasing if A i < 1. 

(iii) The population eventually stabilizes if Li = 1. 

The case A.i = 1 is particularly interesting because it determines a population that has 
zero population growth. For any initial age distribution, the population approaches a 
limiting age distribution that is some multiple of the eigenvector xi . From Equations (6) 
and (7), we see that Li = 1 is an eigenvalue if and only if 

ci\ + aib\ + aib\b2 + ■ ■ ■ + a n b\b2 ■ ■ ■ 7>„-i = 1 (16) 

The expression 

R — a\ + « 2 ^i + a^b\b 2 + • • • + cinb\b 2 ■ ■ ■ b n —\ (17) 

is called the net reproduction rate of the population. (See Exercise 5 for a demographic 
interpretation of R.) Thus, we can say that a population has zero population growth if 
and only if its net reproduction rate is 1. 


Exercise Set 10.16 

1. Suppose that a certain animal population is divided into two 
age classes and has a Leslie matrix 



(a) Calculate the positive eigenvalue X\ oi L and the corre- 
sponding eigenvector xi . 


(b) Beginning with the initial age distribution vector 

x (0) _ 100 

L 0 

calculate x n h x (2) , x®, x (4) , and x®, rounding off to the 
nearest integer when necessary. 

(c) Calculate x® using the exact formula x® = Lx® and 
using the approximation formula x® ~ LiX®. 


682 Chapter 10 Applications of Linear Algebra 


2. Find the characteristic polynomial of a general Leslie matrix 
given by Equation (4). 


(b) From your results in part (a), conjecture a relationship between 
a and b\, b 2 , . . . , b ll _ l that will make L " = 7„, where 


3. (a) Show that the positive eigenvalue Li of a Leslie matrix is 

always simple. Recall that a root X 0 of a polynomial q (X) 
is simple if and only if q'(X 0 ) 0. 

(b) Show that the eigenspace corresponding to Li has dimen- 
sion 1. 

4. Show that the right side of Equation (10) is cxi, where c is the 
first entry of the column vector /’“'x®. 

5. Show that the net reproduction rate R , defined by (17), can be 
interpreted as the average number of daughters born to a single 
female during her expected lifetime. 


'o 

0 

0 

0 

a 

b\ 

0 

0 

0 

0 

0 

b 2 

0 

0 

0 

0 

0 

h 

0 

0 

0 

0 

0 

• • • 7?„_i 

0 


(c) Determine an expression for p n (k) = \XI n — L n \ and use it to 
show that all eigenvalues of L n satisfy |k| = 1 when a and b\, 
b 2 , ■ ■ ■ , b n _ ! are related by the equation determined in part (b). 


6. Show that a population is eventually decreasing if and only if its 
net reproduction rate is less than 1 . Similarly, show that a popu- 
lation is eventually increasing if and only if its net reproduction 
rate is greater than 1 . 

7. Calculate the net reproduction rate of the animal population in 
Example 1. 

8. (For readers with a hand calculator) Calculate the net reproduc- 
tion rate of the Canadian female population in Example 4. 

9. (For readers who have read Sections 10.1-10.3) Prove Theo- 
rem 10.16.2. [Hint: Write X k = re‘ e , substitute into (7), take 
the real parts of both sides, and show that r < X\ .] 

Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica , Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. Consider the sequence of Leslie matrices 







‘o 

0 

a 



"0 

a 



Li = 

b\ 

0 

0 



b\ 

0 _ 














0 

b 2 

0 








”0 

0 

0 

0 

a 

'0 

0 

0 

a 


b\ 

0 

0 

0 

0 

bi 

0 

0 

0 

, L5 = 

0 

b 2 

0 

0 

0 

0 

b 

0 

0 


0 

0 

bi 

0 

0 

0 

0 

b 2 

0 












_0 

0 

0 

bi, 

0 _ 


T2. Consider the sequence of Leslie matrices 


L 2 


a ap 
b 0 


7-3 


a ap ap 2 
b 0 0 

0 b 0 


U 


a ap ap 2 ap 3 

b 0 0 0 

0 b 0 0 

0 0 b 0 


U 


a ap ap 2 ap 3 ap 4 

b 0 0 0 0 

0 b 0 0 0 

0 0 b 0 0 

0 0 0 b 0 



a 

ap 

ap 2 

ap n ~ 2 

ap n ~ r 


b 

0 

0 

0 

0 


0 

b 

0 

0 

0 

L n = 

0 

0 

b 

0 

0 


0 

0 

0 

b 

0 

where 0 < p < 

1,0 

< b 

< 1, and 1 < 

a. 



(a) Choose a value for n (say, n = 8). For various values of a , b, 
and p, use a computer to determine the dominant eigenvalue 
of L n , and then compare your results to the value of a + bp. 


(b) Show that 


p n (X) = \Xl n - L n \ = X n - a 


( X n -(b P y \ 

V a - bp ) 


which means that the eigenvalues of L„ must satisfy 


(a) Use a computer to show that 

L\ = I 2 , L\ = h , 7-4 = 7 4 , L\ = 7 5 , 

for a suitable choice of a in terms of b\, b 2 , . . . , b n -\. 


A" +1 - (a + bp)X" + a(bp) n = 0 

(c) Can you now provide a rough proof to explain the fact that 

ki — a + bpl 


T3. Suppose that a population of mice has a Leslie matrix L over 
a 1 -month period and an initial age distribution vector x <0) given 
by 


'o 

0 

1 

2 

4 

5 

3 

10 

o' 


"50” 

4 

5 

0 

0 

0 

0 

0 


40 

0 

9_ 

0 

0 

0 

0 

and x (0) = 

30 


10 





0 

0 

9 

10 

0 

0 

0 


20 

0 

0 

0 

4 

5 

0 

0 


10 

0 

0 

0 

0 

3 

10 

0 


. 5_ 


10.17 Harvesting of Animal Populations 683 

(a) Compute the net reproduction rate of the population. 

(b) Compute the age distribution vector after 100 months and 101 
months, and show that the vector after 101 weeks is approxi- 
mately a scalar multiple of the vector after 100 months. 

(c) Compute the dominant eigenvalue of L and its corresponding 
eigenvector. How are they related to your results in part (b)? 

(d) Suppose you wish to control the mouse population by feed- 
ing it a substance that decreases its age-specific birthrates (the 
entries in the first row of L) by a constant fraction. What 
range of fractions would cause the population eventually to 
decrease? 


10.17 Harvesting of Animal Populations 

In this section we employ the Leslie matrix model of population growth to model the 
sustainable harvesting of an animal population. We also examine the effect of harvesting 
different fractions of different age groups. 

Age-Specific Population Growth (Section 10.16) 


Harvesting In Section 10.16 we used the Leslie matrix model to examine the growth of a female 
population that was divided into discrete age classes. In this section, we investigate 
the effects of harvesting an animal population growing according to such a model. By 
harvesting we mean the removal of animals from the population. (The word harvesting 
is not necessarily a euphemism for “slaughtering”; the animals may be removed from 
the population for other purposes.) 

In this section we restrict ourselves to sustainable harvesting policies. By this we mean 
the following: 


DEFINITION 1 A harvesting policy in which an animal population is periodically 
harvested is said to be sustainable if the yield of each harvest is the same and the age 
distribution of the population remaining after each harvest is the same. 


Thus, the animal population is not depleted by a sustainable harvesting policy; only the 
excess growth is removed. 

As in Section 10. 16, we will discuss only the females of the population. If the number 
of males in each age class is equal to the number of females — a reasonable assumption 
for many populations — then our harvesting policies will also apply to the male portion 
of the population. 

The Harvesting Model Figure 10. 17. 1 illustrates the basic idea of the model. We begin with a population having 
a particular age distribution. It undergoes a growth period that will be described by the 
Leslie matrix. At the end of the growth period, a certain fraction of each age class is 
harvested in such a way that the unharvested population has the same age distribution 
as the original population. This cycle repeats after each harvest so that the yield is 
sustainable. The duration of the harvest is assumed to be short in comparison with the 
growth period so that any growth or change in the population during the harvest period 
can be neglected. 
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Population before growth period 


Population after growth period 

Pi Pi fP* Pi 



Growth 




Not harvested 


] 




Population 

harvested 



f-r < 


Harvested 




► Figure 10.17.1 


To describe this harvesting model mathematically, let 


x = 


be the age distribution vector of the population at the beginning of the growth period. 
Thus x; is the number of females in the i th class left unharvested. As in Section 10.16, 
we require that the duration of each age class be identical with the duration of the growth 
period. For example, if the population is harvested once a year, then the population is 
divided into 1-year age classes. 

If L is the Leslie matrix describing the growth of the population, then the vector 
Lx is the age distribution vector of the population at the end of the growth period, 
immediately before the periodic harvest. Let /?;, for / — 1, 2, be the fraction of 

females from the / th class that is harvested. We use these n numbers to form an n x n 
diagonal matrix 


~hi 0 0 

0 h 2 0 


O' 

0 

0 


H — 0 0 h 3 


_ 0 0 0 • ■ • h n _ 

which we will call the harvesting matrix. By definition, we have 

0 < hi < 1 (i = 1,2,..., n ) 

That is, we can harvest none (/?,■ = 0), all (hj = 1), or some fraction (0 < h, < 1) of 
each of the n classes. Because the number of females in the i th class immediately before 
each harvest is the / th entry (Lx); of the vector Lx, the / th entry of the column vector 


/?i(Lx)i 
*2 (Lx) 2 


HLx = 


h n (Lx)„ 


is the number of females harvested from the /th class. 
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From the definition of a sustainable harvesting policy, we have 


age distribution 


age distribution 

at end of 

— [harvest] = 

at beginning of 

growth period 


growth period 


or, mathematically, 


Lx — HLx = x 


If we write Equation (1) in the form 


(7 — H)Lx — x 


(1) 

( 2 ) 


we see that x must be an eigenvector of the matrix (7 — H)L corresponding to the eigen- 
value 1. As we will now show, this places certain restrictions on the values of h, and x. 
Suppose that the Leslie matrix of the population is 


L = 


a i a 2 

bi 0 

0 b 2 


C?3 

0 

0 


^n — 1 
0 
0 


.0 0 0 
Then the matrix (7 — H)L is (verify) 


(7 - H)L = 


(1 — /2 i)«i (1 — h\)ci 2 (1 — h\)a 3 
(1 - hi)b x 0 0 

0 (1 - h 3 )b 2 o 


0 


0 


(If! 

0 

0 


b„~ i 0_ 


( 3 ) 


(I - (1 — h\)a n 


0 

0 

(1 b n )b n — i 


Thus, we see that (7 — 77)L is a matrix with the same mathematical form as a Leslie 
matrix. In Section 10.16 we showed that a necessary and sufficient condition for a Leslie 
matrix to have 1 as an eigenvalue is that its net reproduction rate also be 1 [see Eq. (16) of 
Section 10.16], Calculating the net reproduction rate of (7 — 77)L and setting it equal 
to 1, we obtain (verify) 


(1 — 7z i )[«i + a 2 b\ (1 — h 2 ) + a 2 b\b 2 {\ — 7z2) (1 — h 3 ) + • • • 

+ a„bib2---b n -i(l-h 2 )(l-h 3 )---(l-h n )] = 1 (4) 


This equation places a restriction on the allowable harvesting fractions. Only those 
values of hi, h 2 , . . .,h„ that satisfy (4) and that lie in the interval [0, 1] can produce a 
sustainable yield. 

If hi, h 2 , . . . , li n do satisfy (4), then the matrix (7 — 77)L has the desired eigenvalue 
A.i = 1. Furthermore, this eigenvalue has multiplicity 1, because the positive eigenvalue 
of a Leslie matrix always has multiplicity 1 (Theorem 10.16.1). This means that there is 
only one linearly independent eigenvector x satisfying Equation (2). [See Exercise 3(b) 
of Section 10. 16.] One possible choice for x is the following normalized eigenvector: 

1 

b l (l-h 2 ) 

bib 2 (\ - h 2 ){\ - h 2 ) 

Xl_ bib 2 b 2 (\ -h 2 )(i-h 3 )a -h 4 ) (5) 


b l b 2 bi---b n . l (i-h 2 )(i-h 3 )---(i-i, n ) 
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Uniform Harvesting 


Any other solution x of (2) is a multiple of X! . Thus, the vector xi determines the propor- 
tion of females within each of the n classes after a harvest under a sustainable harvesting 
policy. But there is an ambiguity in the total number of females in the population after 
each harvest. This can be determined by some auxiliary condition, such as an ecological 
or economic constraint. For example, for a population economically supported by the 
harvester, the largest population the harvester can afford to raise between harvests would 
determine the particular constant that xi is multiplied by to produce the appropriate vec- 
tor x in Equation (2). For a wild population, the natural habitat of the population would 
determine how large the total population could be between harvests. 

Summarizing our results so far, we see that there is a wide choice in the values of 
hi, h 2 , ■ ■ . , h n that will produce a sustainable yield. But once these values are selected, 
the proportional age distribution of the population after each harvest is uniquely deter- 
mined by the normalized eigenvector xi defined by Equation (5). We now consider a 
few particular harvesting strategies of this type. 


With many populations it is difficult to distinguish or catch animals of specific ages. If 
animals are caught at random, we can reasonably assume that the same fraction of each 
age class is harvested. We therefore set 

h = hi = h 2 = ■ ■ ■ = h n 


Equation (2) then reduces to (verify) 



Hence, 1/(1 — h) must be the unique positive eigenvalue A| of the Leslie growth matrix 
L. That is, 


Solving for the harvesting fraction /?, we obtain 


h = 1 - (lAi) (6) 

The vector x, , in this case, is the same as the eigenvector of L corresponding to the 
eigenvalue A. From Equation (8) of Section 10.16, this is 

1 

b\/X\ 

b\b 2 /X\ 

Xl = bib 2 b,/X\ (7) 

bib 2 ---b n _i/X‘[- l _ 

From (6) we can see that the larger A is, the larger is the fraction of animals we can 
harvest without depleting the population. Note that we need A > 1 in order for the 
harvesting fraction h to lie in the interval (0, 1). This is to be expected, because X\ > 1 
is the condition that the population be increasing. 


► EXAMPLE 1 Harvesting Sheep 

For a certain species of domestic sheep in New Zealand with a growth period of 1 year, 
the following Leslie matrix was found (see G. Caughley, “Parameters for Seasonally 
Breeding Populations,” Ecology, 48, 1967, pp. 834-839). 
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Harvesting Only the 
Youngest Age Class 


L = 


000 

845 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


045 

0 

975 

0 

0 

0 

0 

0 

0 

0 

0 

0 


391 

0 

0 

965 

0 

0 

0 

0 

0 

0 

0 

0 


.472 

0 

0 

0 

.950 

0 

0 

0 

0 

0 

0 

0 


.484 

0 

0 

0 

0 

.926 

0 

0 

0 

0 

0 

0 


546 

0 

0 

0 

0 

0 

895 

0 

0 

0 

0 

0 


543 

0 

0 

0 

0 

0 

0 

850 

0 

0 

0 

0 


502 

0 

0 

0 

0 

0 

0 

0 

786 

0 

0 

0 


.468 

0 

0 

0 

0 

0 

0 

0 

0 

.691 

0 

0 


.459 

0 

0 

0 

0 

0 

0 

0 

0 

0 

.561 

0 


.433 .421" 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

.370 0 


The sheep have a lifespan of 12 years, so they are divided into 12 age classes of duration 
1 year each. By the use of numerical techniques, the unique positive eigenvalue of L can 
be found to be 

A.i = 1.176 

From Equation (6), the harvesting fraction h is 

h = 1 - (lAi) = 1 - (1/1.176) = .150 

Thus, the uniform harvesting policy is one in which 15.0% of the sheep from each of the 
12 age classes is harvested every year. From (7) the age distribution vector of the sheep 
after each harvest is proportional to 


1.000 

0.719 

0.596 

0.489 

0.395 

0.311 

0.237 

0.171 

0.114 

0.067 

0.032 

0.010 


( 8 ) 


From (8) we see that for every 1000 sheep between 0 and 1 year of age that are not 
harvested, there are 719 sheep between 1 and 2 years of age, 596 sheep between 2 and 3 
years of age, and so forth. 


In some populations only the youngest females are of any economic value, so the har- 
vester seeks to harvest only the females from the youngest age class. Accordingly, let us 
set 


/?! = h 

hi = hj, = ■ ■ ■ = h n = 0 
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Equation (4) then reduces to 


(1 - h)(a\ + a 2 b l + a 2 b\b 2 H h a n b\b 2 ■ ■ ■ b„_i) = 1 


or 

(1 - h)R = 1 

where R is the net reproduction rate of the population. [See Equation (17) of Sec- 
tion 10.16.] Solving for h, we obtain 

h = 1 - (1/R) (9) 


Note from this equation that a sustainable harvesting policy is possible only if R > 1. 
This is reasonable because only if R > 1 is the population increasing. From Equation (5), 
the age distribution vector after each harvest is proportional to the vector 


1 

b\ 

b\b 2 

b\b 2 b 2 


(10) 


bib 2 b 2 ■ ■ ■ b n -i 


{'" EXAMPLE 2 Sustainable Harvesting Policy 

Let us apply this type of sustainable harvesting policy to the sheep population in Exam- 
ple 1 . For the net reproduction rate of the population we find 


R — a\ + a 2 b\ + a 2 b\b 2 H + a n b\b 2 ■ ■ ■ £>„_ i 

= (.000) + (.045) (.845) + • • ■ + ( .42 1) (.845) (.975) ■ ■ ■ (.370) 

= 2.514 

From Equation (9), the fraction of the first age class harvested is 
h = 1 - (1/7?) = 1 - (1/2.514) = .602 

From Equation (10), the age distribution of the sheep population after the harvest is 
proportional to the vector 


1.000 


"l.ooo" 

.845 


0.845 

(.845) (.975) 


0.824 

(.845) (.975) (.965) 


0.795 



0.755 



0.699 



0.626 



0.532 



0.418 



0.289 



0.162 

(.845)(.975) • • • (.370) 


0.060 
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Optimal Sustainable Yield 


A direct calculation gives us the following (see also Exercise 3): 

"2.514" 

0.845 

0.824 

0.795 

0.755 

r 0.699 
Lx 1 = 

0.626 

0.532 

0.418 

0.289 

0.162 

0.060 


( 12 ) 


The vector Lxi is the age distribution vector immediately before the harvest. The total 
of all entries in ixi is 8.520, so the first entry 2.514 is 29.5% of the total. This means that 
immediately before each harvest, 29.5% of the population is in the youngest age class. 
Since 60.2% of this class is harvested, it follows that 17.8% (= 60.2% of 29.5%) of the 
entire sheep population is harvested each year. This can be compared with the uniform 
harvesting policy of Example 1, in which 15.0% of the sheep population is harvested 
each year. 


We saw in Example 1 that a sustainable harvesting policy in which the same fraction 
of each age class is harvested produces a yield of 15.0% of the sheep population. In 
Example 2 we saw that if only the youngest age class is harvested, the resulting yield is 
17.8% of the population. There are many other possible sustainable harvesting policies, 
and each generally provides a different yield. It would be of interest to find a sustainable 
harvesting policy that produces the largest possible yield. Such a policy is called an opti- 
mal sustainable harvesting policy, and the resulting yield is called the optimal sustainable 
yield. However, determining the optimal sustainable yield requires linear programming 
theory, which we will not discuss here. We refer you to the following result, which ap- 
pears in J. R. Beddington and D. B. Taylor, “Optimum Age Specific Harvesting of a 
Population,” Biometrics, 29, 1973, pp. 801-809. 


Optimal Sustainable Yield 

An optimal sustainable harvesting policy is one in which either one or two age classes 
are harvested. If two age classes are harvested, then the older age class is completely 
harvested. 


As an illustration, it can be shown that the optimal sustainable yield of the sheep popu- 
lation is attained when 

/?! = 0.522 

(13) 

h 9 = 1.000 

and all other values of /?,- are zero. Thus, 52.2% of the sheep between 0 and 1 year of age 
and all the sheep between 8 and 9 years of age are harvested. As we ask you to show in 
Exercise 2, the resulting optimal sustainable yield is 19.9% of the population. 
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Exercise Set 10.17 

1. Let a certain animal population be divided into three 1-year 
age classes and have as its Leslie matrix 


L = 


(a) Find the yield and the age distribution vector after each 
harvest if the same fraction of each of the three age classes 
is harvested every year. 


(b) Find the yield and the age distribution vector after each 
harvest if only the youngest age class is harvested every 
year. Also, find the fraction of the youngest age class that 
is harvested. 


2. For the optimal sustainable harvesting policy described by 
Equations (13), find the vector xi that specifies the age dis- 
tribution of the population after each harvest. Also calculate 
the vector Lx t and verify that the optimal sustainable yield is 
19.9% of the population. 


3. Use Equation (10) to show that if only the first age class of an 
animal population is harvested, 


Lxi — xi 


-r - r 
0 
0 


0 

where R is the net reproduction rate of the population. 


4. If only the I th class of an animal population is to be periodically 
harvested (7 = 1, 2, . . . , n), find the corresponding harvesting 
fraction h / . 


5. Suppose that all of the 7 th class and a certain fraction h / of the 
/th class of an animal population is to be periodically harvested 
(1 < / < J < ri). Calculated/. 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathematica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 


in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 

Tl. The results of Theorem 10.17.1 suggest the following algo- 
rithm for determining the optimal sustainable yield. 

(i) For each value of; = 1,2 , ,n, set /;, = /; and h k = 0 for 
k 7 ^ i and calculate the respective yields. These n calcula- 
tions give the one-age-class results. Of course, any calcula- 
tion leading to a value of h not between 0 and 1 is rejected. 

(ii) For each value of ;' = 1 , 2, 1 and j = i + l, i + 
2 ,...,«, set hj =h, hj = 1 , and li k = 0 for I / ;, j and 
calculate the respective yields. These \n(n — 1) calculations 
give the two-age-class results. Of course, any calculation 
leading to a value of Ii not between 0 and 1 is again rejected. 

(iii) Of the yields calculated in parts (i) and (ii), the largest is the 
optimal sustainable yield. Note that there will be at most 

n + \n(n — 1 ) = \n(n + 1 ) 

calculations in all. Once again, some of these may lead to a 
value of h not between 0 and 1 and must therefore be rejected. 

If we use this algorithm for the sheep example in the text, there 
willbeatmost |(12)(12 + 1) = 78 calculations to consider. Usea 
computer to do the two-age-class calculations for h\ =h,hj = 1 , 
and h k = 0 for k ^ 1 or j for j = 2,3 12. Construct a sum- 

mary table consisting of the values of h i and the percentage yields 
using y' = 2,3,..., 12, which will show that the largest of these 
yields occurs when j = 9. 

T2. Using the algorithm in Exercise Tl, do the one-age-class cal- 
culations for /;, = h and h k = 0 for k ^ i for t = 1 , 2 , .... 12 . 
Construct a summary table consisting of the values of /;, and the 
percentage yields using i = 1 , 2 ,..., 12 , which will show that the 
largest of these yields occurs when i = 9. 

T3. Referring to the mouse population in Exercise T3 of Section 
10.16, suppose that reducing the birthrates is not practical, so you 
instead decide to control the population by uniformly harvesting 
all of the age classes monthly. 

(a) What fraction of the population must be harvested monthly 
to bring the mouse population to equilibrium eventually? 

(b) What is the equilibrium age distribution vector under this uni- 
form harvesting policy? 

(c) The total number of mice in the original mouse population 
was 155. What would be the total number of mice after 5, 10, 
and 200 months under your uniform harvesting policy? 
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10.18 A Least Squares Model for Human Hearing 

In this section we apply the method of least squares approximation to a model for human 
hearing. The use of this method is motivated by energy considerations. 


Inner Product Spaces 
Orthogonal Projection 
Fourier Series (Section 6.6) 


Anatomy of the Ear We begin with a brief discussion of the nature of sound and human hearing. Figure 
10. 18. 1 is a schematic diagram of the ear showing its three main components: the outer 
ear, middle ear, and inner ear. Sound waves enter the outer ear where they are channeled 
to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically 
link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass 
on the vibrations of the eardrum to a fluid within the cochlea. The cochlea contains 
thousands of minute hairs that oscillate with the fluid. Those near the entrance of the 
cochlea are stimulated by high frequencies, and those near the tip are stimulated by low 
frequencies. The movements of these hairs activate nerve cells that send signals along 
various neural pathways to the brain, where the signals are interpreted as sound. 


► Figure 10.18.1 



ear 
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nerve 


To 

brain 


The sound waves themselves are variations in time of the air pressure. For the 
auditory system, the most elementary type of sound wave is a sinusoidal variation in the 
air pressure. This type of sound wave stimulates the hairs within the cochlea in such a 
way that a nerve impulse along a single neural pathway is produced (Figure 10.18.2). A 
sinusoidal sound wave can be described by a function of time 

q(t) = A 0 + A sin(®f — <5) (1) 




► Figure 10.18.2 
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where q it) is the atmospheric pressure at the eardrum, A 0 is the normal atmospheric pres- 
sure, A is the maximum deviation of the pressure from the normal atmospheric pressure, 
to/2n is the frequency of the wave in cycles per second, and <5 is the phase angle of the 
wave. To be perceived as sound, such sinusoidal waves must have frequencies within a 
certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. 
Frequencies outside this range will not stimulate the hairs within the cochlea enough to 
produce nerve signals. 

To a reasonable degree of accuracy, the ear is a linear system. This means that if a 
complex sound wave is a finite sum of sinusoidal components of different amplitudes, 
frequencies, and phase angles, say, 

q(t) — A 0 + A\ sin(o>ir — 5i) + A 2 sinia> 2 t — S 2 ) H + A„ sin(<w„r — S„ ) (2) 

then the response of the ear consists of nerve impulses along the same neural pathways 
that would be stimulated by the individual components (Figure 10.18.3). 



+ 


+ 



▲ Figure 10.18.3 

Let us now consider some periodic sound wave pit ) with period T [i.e., pit) = 
p(t + T)] that is not a finite sum of sinusoidal waves. If we examine the response of 
the ear to such a periodic wave, we find that it is the same as the response to some wave 
that is the sum of sinusoidal waves. That is, there is some sound wave q(t) as given by 
Equation (2) that produces the same response as pit), even though pit) and q{t) are 
different functions of time. 

We now want to determine the frequencies, amplitudes, and phase angles of the 
sinusoidal components of q (t) . Because q ( t ) produces the same response as the periodic 
wave pit), it is reasonable to expect that q{t) has the same period T as pit). This 
requires that each sinusoidal term in q it) have period T. Consequently, the frequencies 
of the sinusoidal components must be integer multiples of the basic frequency 1 / T of 
the function pit). Thus, the a>k in Equation (2) must be of the form 


a>k = 2kjt/T, k= 1,2,... 


But because the ear cannot perceive sinusoidal waves with frequencies greater than 
20,000 cps, we may omit those values of k for which u>k/2n = k/T is greater than 
20,000. Thus, qit) is of the form 



( 3 ) 


where n is the largest integer such that n/T is not greater than 20,000. 

We now turn our attention to the values of the amplitudes A 0 , A \, . . . , A„ and the 
phase angles <5i , S 2 , . . . , S„ that appear in Equation (3). There is some criterion by which 
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the auditory system “picks” these values so that q (t ) produces the same response as p (t) . 
To examine this criterion, let us set 


e(t) = p(t) - q(t) 


If we consider q(t) as an approximation to p(t), then e(t) is the error in this approx- 
imation, an error that the ear cannot perceive. In terms of e(t), the criterion for the 
determination of the amplitudes and the phase angles is that the quantity 


L 


T r T 

2 


[e{t)Y dt = f [p(t) — q(t)] z dt 

Jo 


(4) 





be as small as possible. We cannot go into the physiological reasons for this, but we note 
that this expression is proportional to the acoustic energy of the error wave e(t) over one 
period. In other words, it is the energy of the difference between the two sound waves 
p(t) and q(t) that determines whether the ear perceives any difference between them. 
If this energy is as small as possible, then the two waves produce the same sensation of 
sound. Mathematically, the function q{t) in (4) is the least squares approximation to 
p(t) from the vector space C[ 0, T] of continuous functions on the interval [0, T], (See 
Section 6.6.) 

Least squares approximations by continuous functions arise in a wide variety of 
engineering and scientific approximation problems. Apart from the acoustics problem 
just discussed, some other examples follow. 

Let S (jc) be the axial strain distribution in a uniform rod lying along the x-axis from 
x = 0 to x = / (Figure 10.18.4). The strain energy in the rod is proportional to the 
integral 

(5(x)] 2 dx 

The closeness of an approximation q (x) to S (x) can be judged according to the strain 
energy of the difference of the two strain distributions. That energy is proportional 
to 

[S(x) — q{x)Y dx 

which is a least squares criterion. 


/ 


/ 


2 . 


Let E ( t ) be a periodic voltage across a resistor in an electrical circuit (Figure 10.18.5). 
The electrical energy transferred to the resistor during one period T is proportional 
to 

T 

[E(t)Y dt 

liq(t) has the same period as E(t) and is to be an approximation to E(t), then the 
criterion of closeness might be taken as the energy of the difference voltage. This is 
proportional to 


f 


[E(t) — q{t)\ dt 


which is again a least squares criterion. 


3. Let y(x) be the vertical displacement of a uniform flexible string whose equilibrium 
position is along the x-axis from x = Otox = / (Figure 10.18.6). The elastic potential 
energy of the string is proportional to 



dx 
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If q(x) is to be an approximation to the displacement, then as before, the energy 
integral 



determines a least squares criterion for the closeness of the approximation. 

Least squares approximation is also used in situations where there is no a priori justi- 
fication for its use, such as for approximating business cycles, population growth curves, 
sales curves, and so forth. It is used in these cases because of its mathematical simplic- 
ity. In general, if no other error criterion is immediately apparent for an approximation 
problem, the least squares criterion is the one most often chosen. 

The following result was obtained in Section 6.6. 

Minimizing Mean Square Error on [0, 2n] 

If f(t) is continuous on [0, 2jr], then the trigonometric function g(t ) of the form 


g(t) = \qq + a\ cos t + ■■■ + £/„ cos nt + b\ sinf + \- b n sinn? 


that minimizes the mean square error 



has coefficients 



f(t)cosktdt, k = 0,1,2, ... ,n 



If the original function f(t) is defined over the interval [0, T] instead of [0, 2jt], a 
change of scale will yield the following result (see Exercise 8): 

Minimizing Mean Square Error on [0, T ] 

If fit) is continuous on [0, T], then the trigonometric function g(t) of the form 


1 


2mt 


2 njT 


g(t ) = ~a 0 + cos — t H b a n cos -y- 1 + b\ sin — t H b b„ sin -y- 1 

that minimizes the mean square error 



has coefficients 



k = 0,\,2,...,n 
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► EXAMPLE 1 Least Squares Approximation to a Sound Wave 


Let a sound wave pit) have a saw-tooth pattern with a basic frequency of 5000 cps 
(Figure 10.18.7). Assume units are chosen so that the normal atmospheric pressure is 
at the zero level and the maximum amplitude of the wave is A. The basic period of the 
wave is T = 1/5000 = .0002 second. From t = 0 to t = T, the function pit) has the 
equation 


Pit) = 


2 A / T 
~T \2 


Theorem 10.18.2 then yields the following (verify): 



We can now investigate how the sound wave p(t) is perceived by the human ear. We note 
that 4/ T = 20,000 cps, so we need only go up to k — 4 in the formulas above. The least 
squares approximation to p(t) is then 


qit) = 


2 A 
n 


2tc 1 Ait 1 67r 1 87r 

sin — t H — sin — t H — sin — t H — sin — t 
T 2 T 3 T 4 T 


The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, 
respectively. In Figure 10.18.8 we have plotted p(t) and qit) over one period. Although 
qit) is not a very good point-by-point approximation to p(t), to the ear, both p(t) and 
qit) produce the same sensation of sound. 


► Figure 10.18.8 



As discussed in Section 6.6, the least squares approximation becomes better as the 
number of terms in the approximating trigonometric polynomial becomes larger. More 
precisely. 



1 A 

fit) a 0 — / iflk cos kt + bk sin kt) 

2 k = l 


-i2 


dt 
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tends to zero as n approaches infinity. We denote this by writing 

1 -A 

/(f) ~ -a 0 + / , ( ak co& kt + bk sinfcf) 

2 k= l 

where the right side of this equation is the Fourier series of .fit). Whether the Fourier 
series of fit) converges to /(f) for each f is another question, and a more difficult 
one. For most continuous functions encountered in applications, the Fourier series does 
indeed converge to its corresponding function for each value of t. 


Exercise Set 10.18 


1. Find the trigonometric polynomial of order 3 that is the least 
squares approximation to the function /(f) = (f — n) 2 over 
the interval [0, 2jt], 

2 . Find the trigonometric polynomial of order 4 that is the least 
squares approximation to the function /(f) = t 2 over the in- 
terval [0, T\. 


3. Find the trigonometric polynomial of order 4 that is the least 
squares approximation to the function /(f) over the interval 
[0. 2jt], where 


fit) = 


I sinf, 
0 , 


0 < t < n 
n < t < 2n 


4 . Find the trigonometric polynomial of arbitrary order n that is 
the least squares approximation to the function /(f) = sin \t 
over the interval [0, 2n], 


5. Find the trigonometric polynomial of arbitrary order n that is 
the least squares approximation to the function /(f) over the 
interval [0, T 1 ], where 


/« = 


t, 

T - f. 


0 < t < \T 
\T < t < T 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathemcitica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


Tl. Let g be the function 


git) = 


3 + 4 sin t 
5 — 4cosf 


for 0 < f < 2 tt . 
cients 



Use a computer to determine the Fourier coeffi- 



for k = 0, 1, 2, 3, 4, 5. From your results, make a conjecture 
about the general expressions for a k and b k . Test your conjecture 
by calculating 


6. For the inner product 

pin 

(u, v) = / u(t)vit)dt 

Jo 

show that 

(a) ||1|| = V^ 

(b) || cos kt || = yTr for k = 1,2,... 

(c) || sin kt || = yTr for k = 1, 2, . . . 

7. Show that the 2 n + 1 functions 

1, cos f, cos2f cosrcf, sinf, sin2f, . . . , sinnf 

are orthogonal over the interval [0, 2n] relative to the inner 
product (u, v) defined in Exercise 6. 

8. If /(f) is defined and continuous on the interval [0, T\, show 
that f{T r/2jr) is defined and continuous for r in the interval 
[0, 2 jt], Use this fact to show how Theorem 10.18.2 follows 
from Theorem 10.18.1. 


— fl 0 T ^2,i a k cos kt + bk s i n kt) 

k= 1 

on the computer and see whether it converges to git). 

T2. Let g be the function 

g(t) = e cos '[cos(sinf) + sin(sinf)] 

for 0 < f < 2jc . Use a computer to determine the Fourier coeffi- 
cients 



for k = 0, 1,2, 3, 4, 5. From your results, make a conjecture 
about the general expressions for a k and b k . Test your conjecture 
by calculating 

-a 0 + tci k cos kt + b k sin kt) 

k= 1 

on the computer and see whether it converges to git). 
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10.19 Warps and Morphs 

Among the more interesting image-manipulation techniques available for computer 
graphics are warps and morphs. In this section we show how linear transformations can be 
used to distort a single picture to produce a warp, or to distort and blend two pictures to 
produce a morph. 


Geometry of Linear Operators on R 2 (Section 4.11) 
Linear Independence 
Bases in R 2 


Computer graphics software enables you to manipulate an image in various ways, such 
as by scaling, rotating, or slanting the image. Distorting an image by separately moving 
the corners of a rectangle containing the image is another basic image-manipulation 
technique. Distorting various pieces of an image in different ways is a more complicated 
procedure that results in a warp of the picture. In addition, warping two different images 
in complementary ways and blending the warps results in a morph of the two pictures 
(from the Greek root meaning “shape” or “form”). An example is Figure 10. 1 9. 1 in which 
four photographs of a woman taken over a 50-year period (the four diagonal pictures 
from top left to bottom right) have been pairwise morphed by different amounts to 
suggest the gradual aging of the woman. 



► Figure 10.19.1 
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The most visible application of warping and morphing images has been the produc- 
tion of special effects in motion pictures and television. However, many scientific and 
technological applications of such techniques have also arisen — for example, studying 
the evolution, growth, and development of living organisms, assisting in reconstructive 
and cosmetic surgery, exploring various designs of a product, and “aging” photographs 
of missing persons or police suspects. 


Warps 



(a) 


We begin by describing a simple warp of a triangular region in the plane. Let the three 
vertices of a triangle be given by the three noncollinear points Vi, V 2 , and V 3 (Figure 
10. 19.2a). We will call this triangle the begin-triangle . If v is any point in the begin- 
triangle, then there are unique constants c\ and C 2 such that 

V - V 3 = Ci(V! - v 3 ) + c 2 (v 2 - v 3 ) (1) 

Equation (1) expresses the vector v — v 3 as a (unique) linear combination of the two 
linearly independent vectors vi — v 3 and v 2 — v 3 with respect to ail origin at v 3 . If we set 
c 3 = 1 — Ci — c 2 , then we can rewrite ( 1 ) as 

V = CiV! + C 2 V 2 + C 3 V 3 (2) 



(■ b ) 

▲ Figure 10.19.2 


y 




w = qwj + c 2 w 2 + C3W3 


(b) 

▲ Figure 10.19.3 


where 

ci + C 2 + c 3 = 1 (3) 

from the definition of c 3 . We say that visa convex combination of the vectors vi , V 2 , and v 3 
if (2) and (3) are satisfied and, in addition, the coefficients ci, C 2 , and c 3 are nonnegative. 
It can be shown (Exercise 6) that v lies in the triangle determined by Vi, v 3 , and v 3 if and 
only if it is a convex combination of those three vectors. 

Next, given three noncollinear points Wi, W 2 , and w 3 of an end-triangle (Figure 
10.19.2 b), there is a unique affine transformation that maps vi to wi, V 2 to w 3 , and 
v 3 to w 3 . That is, there is a unique 2x2 invertible matrix M and a unique vector b such 
that 

w,- = M\j + b for i = 1, 2, 3 (4) 

(See Exercise 5 for the evaluation of M and b.) Moreover, it can be shown (Exercise 3) 
that the image w of the vector v in (2) under this affine transformation is 

w = C 1 W 1 + C 2 W 2 + c 3 w 3 (5) 

This is a basic property of affine transformations: They map a convex combination of 
vectors to the same convex combination of the images of the vectors. 

Now suppose that the begin-triangle contains a picture within it (Figure 10.19.3fl). 
That is, to each point in the begin-triangle we assign a gray level, say 0 for white and 
100 for black, with any other gray level lying between 0 and 100. In particular, let a 
scalar- valued function po, called the picture-density of the begin-triangle, be defined so 
that po(v) is the gray level at the point v in the begin-triangle. We can now define a 
picture in the end-triangle, called a warp of the original picture, with a picture-density 
Pi by defining the gray level at the point w within the end-triangle to be the gray level of 
the point v in the begin-triangle that maps onto w. In equation form, the picture-density 
Pi is determined by 

Pi (w) = p 0 (ci vi + c 2 v 2 + c 3 v 3 ) (6) 

In this way, as ci , c 3 , and c 3 vary over all nonnegative values that add to one, ( 5) generates 
all points w in the end-triangle, and (6) generates the gray levels pi(w) of the warped 
picture at those points (Figure 10.19.3b). 

Equation (6) determines a very simple warp of a picture within a single triangle. 
More generally, we can break up a picture into many triangular regions and warp each 
triangular region differently. This gives us much freedom in designing a warp through 
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(c) 

▲ Figure 10.19.4 


our choice of triangular regions and how we change them. To this end, suppose we are 
given a picture contained within some rectangular region of the plane. We choose n 
points Vi, V 2 , ■■■ ,y n within the rectangle, which we call vertex points, so that they fall 
on key elements or features of the picture we wish to warp (Figure 10.19.4u). Once the 
vertex points are chosen, we complete a triangulation of the rectangular region; that is, 
we draw line segments between the vertex points in such a way that we have the following 
conditions (Figure 10.19.4i>): 

The line segments form the sides of a set of triangles. 

2. The line segments do not intersect. 

3. Each vertex point is the vertex of at least one triangle. 

4. The union of the triangles is the rectangle. 

5. The set of triangles is maximal (i.e., no more vertices can be connected). 

Note that condition 4 requires that each corner of the rectangle containing the picture 
be a vertex point. 

One can always form a triangulation from any n vertex points, but the triangulation 
is not necessarily unique. For example. Figures 10.19.47> and 10.19.4c are two different 
triangulations of the set of vertex points in Figure 10.19.4«. Since there are various 
computer algorithms that perform triangulations very quickly, it is not necessary to 
perform the tiresome triangulation task by hand; one need only specify the desired vertex 
points and let a computer generate a triangulation from them. If n is the number of vertex 
points chosen, it can be shown that the number of triangles m of any triangulation of 
those points is given by 

m — 2n — 2 — k (7) 

where k is the number of vertex points lying on the boundary of the rectangle, including 
the four situated at the corner points. 

The warp is specified by moving the n vertex points vi, v 2 , . . . , v„ to new locations 
Wi, w 2 , . . . , w„ according to the changes we desire in the picture (Figures 10. 19.5 a and 
10.19.5T>). However, we impose two restrictions on the movements of the vertex points: 

The four vertex points at the corners of the rectangle are to remain fixed, and any 
vertex point on a side of the rectangle is to remain fixed or move to another point on 
the same side of the rectangle. All other vertex points are to remain in the interior 
of the rectangle. 

2. The triangles determined by the triangulation are not to overlap after their vertices 
have been moved. 

The first restriction guarantees that the rectangular shape of the begin-picture is pre- 
served. The second restriction guarantees that the displaced vertex points still form a 
triangulation of the rectangle and that the new triangulation is similar to the original one. 
For example, Figure 10.19.5c is not an allowable movement of the vertex points shown 
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Time-Varying Warps 


in Figure 10.19.5a. Although a violation of this condition can be handled mathemati- 
cally without too much additional effort, the resulting warps usually produce unnatural 
results and we will not consider them here. 

Figure 10.19.6 is a warp of a photograph of a woman using a triangulation with 94 
vertex points and 179 triangles. Note that the vertex points in the begin-triangulation 
are chosen to lie along key features of the picture (hairline, eyes, lips, etc.). These vertex 
points were moved to final positions corresponding to those same features in a picture of 
the woman taken 20 years after the begin-picture. Thus, the warped picture represents 
the woman forced into her older shape but using her younger gray levels. 



Begin-picture 


Warped picture 




Begin-triangulation 



Warped triangulation 



Warped triangulation 


A time-varying warp is the set of warps generated when the vertex points of the begin- 
picture are moved continually in time from their original positions to specified final 
positions. This gives us a motion picture in which the begin-picture is continually warped 
to a final warp. Let us choose time units so that t = 0 corresponds to our begin-picture 
and t = 1 corresponds to our final warp. The simplest way of moving the vertex points 
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from time 0 to time 1 is with constant velocity along straight-line paths from their initial 
positions to their final positions. 

To describe such a motion, let u ,-(f) denote the position of the ith vertex point at 
any time t between 0 and 1 . Thus u,- (0) = V; (its given position in the begin-picture) and 
U;(l) = w,- (its given position in the final warp). In between, we determine its position 
by 

U;(t) = (1 ~t)\i +tw ; (8) 

Note that (8) expresses u,(f) as a convex combination of v,- and w,- for each t in [0, 1J. 
Figure 10.19.7 illustrates a time-varying triangulation of a plain rectangular region with 
six vertex points. The lines connecting the vertex points at the different times are the 
space-time paths of these vertex points in this space-time diagram. 


w i 


“i(0 


► Figure 10.19.7 v i 

Once the positions of the vertex points are computed at time f , a warp is performed 
between the begin-picture and the triangulation at time t determined by the displaced 
vertex points at that time. Figure 10.19.8 shows a time-varying warp at five values of t 
generated from the warp between t — 0 and t = 1 shown in Figure 10.19.6. 



► Figure 10.19.8 



t = 0.00 t = 0.25 1 = 0.50 t = 0.75 f = 1.00 


Morphs A time-varying morph can be described as a blending of two time-varying warps of two 
different pictures using two triangulations that match corresponding features in the two 
pictures. One of the two pictures is designated as the begin-picture and the other as the 
end-picture. First, a time-varying warp from t = 0 to t = 1 is generated in which the 
begin-picture is warped into the shape of the end-picture. Then a time-varying warp 
from t = 1 to t — 0 is generated in which the end-picture is warped into the shape of 
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the begin-picture. Finally, a weighted average of the gray levels of the two warps at each 
time t is produced to generate the morph of the two images at time t. 

Figure 10. 19.9 shows two photographs of a woman taken 20 years apart. Below the 
pictures are two corresponding triangulations in which corresponding features of the 
two photographs are matched. The time-varying morph between these two pictures for 
five values of t between 0 and 1 is shown in Figure 10.19.10. 



Begin-picture 


End-picture 





► Figure 10.19.10 r = 0.00 f = 0.25 * = 0.50 * = 0.75 r = 1.00 


The procedure for producing such a morph is outlined in the following nine steps 

(Figure 10.19.11): 

Step 1. Given a begin-picture with picture-density po and an end-picture with picture- 
density pi, position n vertex points Vi, V 2 , . . . , v„ in the begin-picture at key 
features of that picture. 

Step 2. Position n corresponding vertex points wi, W 2 , . . . , w„ in the end-picture at the 
corresponding key features of that picture. 

Step 3. Triangulate the begin- and end-pictures in similar ways by drawing lines between 
corresponding vertex points in both pictures. 

Step 4. For any time t between 0 and 1, find the vertex points ui(f), 112 (f), . . . , u„(f) in 
the morph picture at that time, using the formula 

u; (f ) = (1 — f)v; + fWi, i = 1,2, ... ,n (9) 

Step 5. Triangulate the morph picture at time t similar to the begin- and end-picture 
triangulations. 
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► Figure 10.19.11 



Time = 1 
End-picture 
Given density: p^w) 


Time = t 
Morph-picture 
Computed density: 

Pl( u) = (1 -f)Po(v) + fPi(w) 


Time = 0 
Begin-picture 
Given density: p 0 ( v ) 


Step 6. For any point u in the morph picture at time t, find the triangle in the triangula- 
tion of the morph picture in which it lies and the vertices u/ (?), iiy (t), and (?) 
of that triangle. (See Exercise 1 to determine whether a given point lies in a given 
triangle.) 

Step 7. Express u as a convex combination of u 7 (f), u j(t), and u A -(f) by finding the 
constants ci,cj, and ck such that 

u = c/u/(f) + cjuj(t) + c K n K (t) (10) 

and 

ci + cj + c K = 1 (11) 

Step 8. Determine the locations of the point u in the begin- and end-pictures using 

v = c/V/ + cj\j + ck^k (in the begin-picture) (12) 

and 

w = C/W/ + C/W/ + ck^k (in the end-picture) (13) 

Step 9. Finally, determine the picture-density p, (u) of the morph-picture at the point u 
using 

p,(u) = (1 - t)p 0 (y) + ?pi(w) (14) 

Step 9 is the key step in distinguishing a warp from a morph. Equation (14) takes 
weighted averages of the gray levels of the begin- and end-pictures to produce the gray 
levels of the morph-picture. The weights depend on the fraction of the distances that the 
vertex points have moved from their beginning positions to their ending positions. For 
example, if the vertex points have moved one-fourth of the way to their destinations (i.e., 
if t = 0.25), then we use one-fourth of the gray levels of the end-picture and three-fourths 
of the gray levels of the begin-picture. Thus, as time progresses, not only does the shape 
of the begin-picture gradually change into the shape of the end-picture (as in a warp) 
but the gray levels of the begin-picture also gradually change into the gray levels of the 
end-picture. 

The procedure described above to generate a morph is cumbersome to perform by 
hand, but it is the kind of dull, repetitive procedure at which computers excel. A suc- 
cessful morph demands good preparation and requires more artistic ability than mathe- 
matical ability. (The software designer is required to have the mathematical ability.) The 
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two photographs to be morphed should be carefully chosen so that they have matching 
features, and the vertex points in the two photographs also should be carefully chosen so 
that the triangles in the two resulting triangulations contain similar features of the two 
pictures. When the procedure is done correctly, each frame of the morph should look 
just as “real” as the begin- and end-pictures. 

The techniques we have discussed in this section can be generalized in numerous ways 
to produce much more elaborate warps and morphs. For example: 

If the pictures are in color, the three components of the picture colors (red, green, 
and blue) can be morphed separately to produce a color morph. 

2. Rather than following straight-line paths to their destinations, the vertices of a tri- 
angulation can be directed separately along more complicated paths to produce a 
variety of results. 

3. Rather than travel with constant speeds along their paths, the vertices of a triangu- 
lation can be directed to have different speeds at different times. For example, in a 
morph between two faces, the hairline can be made to change first, then the nose, 
and so forth. 

Similarly, the gray-level mixing of the begin-picture and end-picture at different times 
and different vertices can be varied in a more complicated way than that in Equation 
(14). 

5. One can morph two surfaces in three-dimensional space (representing two complete 
heads, for example) by triangulating the surfaces and using the techniques in this 
section. 

One can morph two solids in three-dimensional space (for example, two three- 
dimensional tomographs of a beating human heart at two different times) by dividing 
the two solids into corresponding tetrahedral regions. 

7 Two film strips can be morphed frame by frame by different amounts between each 
pair of frames to produce a morphed film strip in which, say, an actor walking along 
a set is gradually morphed into an ape walking along the set. 

8. Instead of using straight lines to triangulate two pictures to be morphed, more com- 
plicated curves, such as spline curves, can be matched between the two pictures. 

9. Three or more pictures can be morphed together by generalizing the formulas given 
in this section. 

These and other generalizations have made warping and morphing two of the most active 
areas in computer graphics. 


Exercise Set 10.19 

1. Determine whether the vector v is a convex combination of the 
vectors Vi, V2, and V3. Do this by solving Equations (1) and (3) 
for Ci , C 2 , and C 3 and ascertaining whether these coefficients are 
nonnegative. 


(a) v = 


(b) v = 


3 

, v, = 

1 

1 

, V 2 = 

3 

, v 3 = 

4 

3 

5 

2 

2 

, Vi = 

1 

1 

, v 2 = 

3 

, v 3 = 

4 

4 

5 

2 


(c) V = 


V| = 


V2 = 


V 3 = 


1 


3 


-2 


3 


- Vl = 


- V 2 = 


- V 3 = 


0 


3 


-2 


0 


2. Verify Equation (7) for the two triangulations given in Fig- 
ure 10.19.4. 

3. Let an affine transformation be given by a 2 x 2 matrix M and 
a two-dimensional vector b. Let v = ciVi + C2V2 + C3V3, where 
Ci + C2 + C3 = 1; let w = Mv + b; and let w, = Mv, + b for 
i = 1, 2, 3. Show that w = c^Wi + C2W2 + C3W3. (This shows 
that an affine transformation maps a convex combination of 
vectors to the same convex combination of the images of the 
vectors.) 
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4. (a) Exhibit a triangulation of the points in Figure 10.19.4 in 
which the points V 3 , V 5 , and v 6 form the vertices of a single 
triangle. 

(b) Exhibit a triangulation of the points in Figure 10.19.4 in 
which the points V 2 , V 5 , and do not form the vertices of a 
single triangle. 


v lies on one of the three sides of the triangle determined by 
the three vectors Vi, v 2 , and v 3 ? 

(c) What can you say about the coefficients Ci, c 2 , and c 3 that 
determine a convex combination v = C[Vi + c 2 v 2 + c 3 v 3 if 
v lies in the interior of the triangle determined by the three 
vectors Vi, v 2 , and v 3 ? 


5. Find the 2x2 matrix M and two-dimensional vector b that 
define the affine transformation that maps the three vectors vj , 
v 2 , and v 3 to the three vectors Wj, w 2 , and w 3 . Do this by setting 
up a system of six linear equations for the four entries of the 
matrix M and the two entries of the vector b. 


(a) V! = 


Wj = 


(b) vi = 


T 


'2' 


"2" 


, v 2 = 


, v 3 = 


_ 1 _ 


_3_ 


_ 1 _ 


w 2 = 


w 3 = 


8. (a) The centroid of a triangle lies on the line segment connect- 
ing any one of the three vertices of the triangle with the 
midpoint of the opposite side. Its location on this line seg- 
ment is two-thirds of the distance from the vertex. If the 
three vertices are given by the vectors Vi, v 2 , and v 3) write 
the centroid as a convex combination of these three vectors. 


(b) Use your result in part (a) to find the vector defining the 

- F5' 

centroid of the triangle with the three vertices 








T 

~-2 


'O' 


'2' 

and 


, v 2 = 

0 

, v 3 = 

_ 1 _ 


1 

2_ 




Wi 

(c) V! 

Wi 


(d) V! 




- 5- 


-7- 


- 7 - 

W! = 

2 

, W 2 = 

2 

_3_ 

, w 3 = 

2 

9_ 


Working with Technology 

The following exercises are designed to be solved using a technol- 
ogy utility. Typically, this will be matlab, Mathemcitica, Maple, 
Derive, or Mathcad, but it may also be some other type of linear 
algebra software or a scientific calculator with some linear algebra 
capabilities. For each exercise you will need to read the relevant 
documentation for the particular utility you are using. The goal 
of these exercises is to provide you with a basic proficiency with 
your technology utility. Once you have mastered the techniques 
in these exercises, you will be able to use your technology utility 
to solve many of the problems in the regular exercise sets. 


6. (a) Fet a and b be linearly independent vectors in the plane. 

Show that if c\ and c 2 are nonnegative numbers such that 
Ci + c 2 = 1, then the vector cia + c 2 b lies on the line seg- 
ment connecting the tips of the vectors a and b. 

(b) Fet a and b be linearly independent vectors in the plane. 
Show that if Ci and c 2 are nonnegative numbers such that 
Ci + c 2 < 1, then the vector Cia + c 2 b lies in the triangle 
connecting the origin and the tips of the vectors a and b. 
[Hint: First examine the vector c^ + c 2 b multiplied by the 
scale factor l/(ci + c 2 ).] 

(c) Fet Vj , v 2 , and v 3 be noncollinear points in the plane. Show 
that if ci, c 2 , and c 3 are nonnegative numbers such that 
Ci + c 2 + c 3 = 1, then the vector ciV[ + c 2 v 2 + c 3 v 3 lies in 
the triangle connecting the tips of the three vectors. [Hint: 
Fet a = Vi — v 3 and b = v 2 — v 3 , and then use Equation (1) 
and part (b) of this exercise.] 

7. (a) What can you say about the coefficients ci, c 2 , and c 3 that 

determine a convex combination v = ciVi + c 2 v 2 + c 3 v 3 if 
v lies on one of the three vertices of the triangle determined 
by the three vectors Vi, v 2 , and v 3 ? 

(b) What can you say about the coefficients Ci, c 2 , and c 3 that 
determine a convex combination v = CiVi + c 2 v 2 + c 3 v 3 if 


Tl. To warp or morph a surface in R 3 we must be able to triangu- 



fn 


V 2 1 


u 3 1 

late the surface. Fet v, = 

V\ 2 

, v 2 = 

v 22 

, and v 3 = 

^32 


fl3 


V23 


l) 3 3 


be three noncollinear vectors on the surface. Then a vector 


v = 


«i 

v 2 


lies in the triangle formed by these three vectors if and 


t>3 


only if v is a convex combination of the three vectors; that is, 
v = C}Vi + c 2 v 2 + c 3 v 3 for some nonnegative coefficients Ci, c 2 , 
and c 3 whose sum is 1 . 


(a) Show that in this case, ci, c 2 , and c 3 are solutions of the fol- 
lowing linear system: 


Ul 

t> 2 l 

V31 




~Vi~ 





Cl 



v 12 

V 22 

V32 


Cl 

— 

v 2 

U]3 

V 2 3 

V33 




v 3 

_ 1 

1 

1 _ 


C 3 


_1_ 





In parts (b)-(d) determine whether the vector v is a convex combi- 



2 


'3' 


2 

nation of the vectors Vi = 

7 

, v 2 = 

0 

, and v 3 = 

2 


-5 


9 


-4 
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1 

9 

1 

10 

1 

13 

V = - 

4 

9 

9 

(c) v= 4 

9 

9 

(d) v= - 

-7 

50 


T2. To warp or morph a solid object in R 3 we first partition the 



tfii 


V21 

object into disjoint tetrahedrons. Let Vi = 

V12 

, V 2 = 

V22 


Vl 3 


V23 



U31 


U41 

V3 = 

V32 

, and V4 = 

V42 


V33 


V43 


be four noncoplanar vectors. Then 


Vl 


a vector v = 


V 2 


lies in the solid tetrahedron formed by these 


v 3 


four vectors if and only if v is a convex combination of the three 
vectors; that is, v = CiVi + c 2 v 2 + C3V3 + C4V4 for some nonnega- 
tive coefficients Cj, c 2 , c 3, and c 4 whose sum is one. 


(a) Show that in this case, Ci, c 2 , C3, and C4 are solutions of the 
following linear system: 


Vll h 21 V 31 f 41 


Cl 


Vi 

Vl2 V22 U 32 V42 


C 2 


v 2 

U13 V23 U33 t >43 


C 3 


V3 

_ ! 1 1 1 _ 


_C 4 _ 


1 


In parts (b)-(d) determine whether the vector v is a convex 



2 


'-3' 


"7" 

combination of the vectors = 

-6 

, v 2 = 

4 

, v 3 = 

2 


1 


2 


3 


(b) v = 

1 

O L/i 

1 

(c) v = 

T 

1 

(d) v = 

"f 

2 


7 


2 


2 


10.20 Internet Search Engines 

In this section we describe a basic technique used by Internet search engines to rank web 
pages according to their importance. 


Basic Probability Concepts 
Intuitive Understanding of Limits 
Eigenvalues and Eigenvectors 

Dynamical Systems and Markov Chains (Section 5.5) or 
Markov Chains (Section 10.4) 


Web Surfing Assume that Alice and Bob are each given a set of six web pages containing key words on 
a topic of common interest. Each has his or her own strategy for establishing an order 
of importance for the pages. 


Alice's Strategy 



A Figure 10.20.1 


Alice decides that the network of links (references) between the pages can provide a 
means of measuring their relative importance, so she draws a diagram called a webgmph 
that shows the links among the six web pages (Figure 10.20. 1 ). A directed path from the 
/th page to the y'th page means that the z'th page has an outgoing link to the yth page 
(i.e., it references that page). 

Alice proceeds as follows: 

She disregards links to or from pages outside the six given pages. 

She disregards links from a page to itself. 

She disregards duplicate links. 

She assumes there are no dangling pages (i.e., pages with no outgoing links). 

Alice then designs a “web surfing” strategy in which she picks one of the pages (say 
Page 2), clicks on one of its links, and connects to another page. She then repeats the 
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procedure starting from the new page, and thereby surfs from page to page. She tracks 
how many times she visits each page in the set after 10, 100, 1000, 10,000, and 20,000 
mouse clicks and creates Table 1. (Notice that the number of pages visited is one more 
than the number of mouse clicks.) 


Number of Visits to Each Page 



Total Number of Mouse Clicks 

Page 

0 

10 

100 

1,000 

10,000 

20,000 

1 

0 

3 

21 

165 

1,504 

3,012 

2 

1 

2 

16 

148 

1,391 

2,790 

3 

0 

3 

27 

271 

2,706 

5,424 

4 

0 

0 

4 

100 

1,096 

2,206 

5 

0 

2 

22 

155 

1,415 

2,745 

6 

0 

1 

11 

162 

1,889 

3,824 


She also creates Table 2 in which she computes the fraction of visits to each page to four 
decimal places. 


Table 2 Fraction of Visits to Each Page 



Total Number of Mouse Clicks 

Page 

0 

10 

100 

1,000 

10,000 

20,000 

i 

0.0000 

0.2727 

0.2079 

0.1648 

0.1504 

0.1506 

2 

1.0000 

0.1818 

0.1584 

0.1479 

0.1391 

0.1395 

3 

0.0000 

0.2727 

0.2673 

0.2707 

0.2706 

0.2712 

4 

0.0000 

0.0000 

0.0396 

0.0999 

0.1096 

0.1103 

5 

0.0000 

0.1818 

0.2178 

0.1548 

0.1415 

0.1372 

6 

0.0000 

0.0909 

0.1089 

0.1618 

0.1889 

0.1912 


Based on 20,000 repetitions she identifies Page 3 as the most important since it was 
visited the most often, and she ranks the pages in decreasing order of importance: 

3, 6, 1, 2, 5, 4 (1) 

Markov Matrix Approach Observe that for each page listed in Table 2 the fractions seem to stabilize. This is not 

accidental; we will see that for this example regardless of which page is chosen initially 
and which outgoing links are chosen subsequently, the fraction of visits to each page will 
converge to a limiting value that depends only on the structure of the webgraph. The 
limiting values of these fractions, called page ranks , can be taken as a measure of the 
relative importance of the pages. 

Although the procedure used by Alice is satisfactory for her small webgraph, it is not 
workable for large webgraphs such as the World Wide Web (WWW). For large webgraphs, 
the same ranking that Alice obtained can be accomplished more efficiently using Markov 
chains (see Section 5.5 or 10.4). As a first step in explaining that method, we define the 
adjacency matrix of a webgraph with n pages to be the n x n matrix A whose ij th entry 
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a,j is 1 if the y'th page has an outgoing link to the ith page and 0 otherwise. For example, 
you should be able to see that the adjacency matrix for the webgraph in Figure 10.20. 1 is 

"oiooir 
0 0 10 10 
10 0 10 1 

A = (2) 

0 1 0 0 0 1 

0 110 0 0 

_0 0 1 1 1 0 _ 

Notice that 

The sum of the entries of the i th row of an adjacency matrix is the number of incoming 
links to the /th page from the other pages. 

The sum of the entries in the /'th column is the number of outgoing links on the yth 
page to other pages. 

We make the following definition. 


DEFINITION 1 If a webgraph with n pages is “surfed” by clicking a mouse, then the 
state vector x® is the n x 1 column vector whose /th entry is the probability that the 
surfer is on the i th page after k random mouse clicks. 


To illustrate this idea, suppose that it is known with certainty that a surfer is on the 
jth page after k mouse clicks, in which case the /'th entry of x® is 1 and all other entries 
are 0. It follows from this that the product Ax® is the /'th column vector of A (verify). 
For example, if we know with certainty that Alice begins surfing from Page 2, then her 
initial state vector x (0) and the product Ax (0) would be 


"O' 


"0 

1 

0 

0 

1 

r 


"0" 


T 

1 


0 

0 

1 

0 

1 

0 


1 


0 

0 

x 

3 

II 

1 

0 

0 

1 

0 

l 


0 


0 

0 


0 

1 

0 

0 

0 

l 


0 


1 

0 


0 

1 

1 

0 

0 

0 


0 


1 

0 


0 

0 

1 

1 

1 

0 


0 


0 


The unit entries of Ax (0) tell us that from Page 2 Alice has the option of going to 
either Page 1 , 4, or 5 since these are the only pages to which there are outgoing links. 
Assuming that Alice chooses outgoing links randomly, each of these three pages would 
have probability 1/3 of being chosen. 

To formalize Alice s idea of choosing outgoing links randomly, we make the follow- 
ing definition. 


DEFINITION 2 The probability transition matrix B = [/),,] associated with an adja- 
cency matrix A = [a,-;] is the matrix obtained by dividing each entry of A by the sum 
of the entries in the same column; that is, 


h . . — A 

V 'J ~ n 

l^k=\ a kj 
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'0 

1/3 

0 

0 

1/3 

1/3" 


0 

0 

1/3 

0 

1/3 

0 

B = 

1 

0 

0 

1/2 

0 

1/3 

0 

1/3 

0 

0 

0 

1/3 


0 

1/3 

1/3 

0 

0 

0 


_0 

0 

1/3 

1/2 

1/3 

0 _ 


You should be able to see that 0 < bij < 1 and that the entries in each column of B sum 
up to 1. As an example, the probability transition matrix associated with (2) is 


(3) 


This matrix incorporates the probability information for advancing randomly from one 
page to the next with a mouse click. For example, if we know with certainty that Alice 
is initially on Page 2, then her initial state vector is 

O' 

1 
0 
0 
0 
0. 

Her state vector to four decimal places after one mouse click will be 


x (0) = 


"0 

1/3 

0 

0 

1/3 

1/3" 


'O' 


-1/3" 


"0.3333" 

0 

0 

1/3 

0 

1/3 

0 


1 


0 


0 

1 

0 

0 

1/2 

0 

1/3 


0 


0 


0 

0 

1/3 

0 

0 

0 

1/3 


0 


1/3 


0.3333 

0 

1/3 

1/3 

0 

0 

0 


0 


1/3 


0.3333 

_0 

0 

1/3 

1/2 

1/3 

0 _ 


_0_ 


_ 0 _ 


0 


t (1 > = Bx® = 


and her state vectors resulting from successive mouse clicks will form the sequence 

x (k) = Bx (k ~ l) , k= 1,2,3,... (4) 

It follows from this that her successive state vectors rounded to four decimal places will 
be 


x<°> = 

"0" 

1 

0 

, x« = 

'0.3333' 

0 

0 

, x< 2) = 

"0.1111" 

0.1111 

0.5000 

, x® = 

"0.1296" 

0.1667 

0.2037 


0 


0.3333 


0 


0.1296 


0 


0.3333 


0 


0.2037 


0 


o 


0.2778 


0.1667 



'0.1533' 


'0.1562' 


'0.1544' 


0.1245 


0.1366 


0.1365 

x® = 

0.3014 

0.1121 

, x< 10 > = 

0.2700 

0.1101 

, x (15 > = 

0.2727 

0.1090 


0.1286 


0.1366 


0.1365 


0.1800 


0.1905 


0.1910 
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Eigenvector of the 
Transition Matrix 


Bob's Strategy 


If we accept that the state vector x (i) approaches a limit xasl (the number of mouse 
clicks) increases without bound, then it follows from (4) that 


x = fix (5) 

That is, x is an eigenvector of B corresponding to the eigenvalue 1. If x is scaled so that 
its entries sum to 1, then the entries of x can be interpreted as the fraction of times that 
we can expect each page to be visited as the number of mouse clicks increases without 
bound. For example, with the help of a computer algebra system such as MATLAB, 
Mathematica, or Maple one can show that for the matrix B in (3) such an eigenvector is 


"i r 


”0.1545" 

15 


0.1364 

30 


0.2727 

12 


0.1091 

15 


0.1364 

_21_ 


_0.1909_ 


Compare this to the results that Alice obtained in Table 2. 


Although Bob agrees with Alice’s definition of page rank, he realizes that it can be 
misleading for certain webgraphs. For example, in Figure 10.20.2a the webgraph consists 
of two unlinked page clusters. In this case should the initial state vector have zero 
probabilities for all of the pages in one of the clusters, then so will all subsequent state 
vectors so that no pages in that cluster will ever be accessed. A more subtle example is 
illustrated in Figure 10.20.26. In this case the cluster of Pages 1, 2, and 3 has no outgoing 
links to the cluster of Pages 4, 5, and 6, so once a surfer exits cluster 4, 5, 6 the surfer 
will be “trapped” in cluster 1, 2, 3 and the fractional page counts for Pages 4, 5, and 6 
will approach zero, thereby assigning the pages in that cluster a page rank of 0. 



► Figure 10.20.2 ( a ) ( b ) 


Bob’s solution to this problem is to assume that he is not required to follow only 
the links on his current page but can with a certain probability choose any page in the 
network to go to next. Specifically, Bob assumes that there is a probability of 5, called 
the damping factor , that he will go to the next page by choosing a link on the current 
page and a probability of 1 — S that he will choose the next page at random. If there 
are n pages in the network, then in the latter case the probability that he will choose any 
particular page at random is 

1-5 

n 
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To implement his strategy Bob creates a new probability transition matrix M = [irijj] in 
which 

1 - S 

nijj = Sbij H (6) 

n 

with bij as given in Definition 2. He then replaces (4) with the iterative scheme 

x (i) = Mx®" 1 ’, k= 1,2,3,... (7) 


In Exercise 4 we will ask you to show that M is a probability transition matrix; that 
is, its entries are nonnegative and the entries in each column sum to 1 . We will also ask 
you to show that M can be written as 


M = SB + 


1 -S 


1 1 
1 1 

1 1 


1 

1 

1 


( 8 ) 


with B as given in Definition 2. It follows from this that the iterative scheme in (7) can 
be written in the form 

r 

1 


As an example, consider the webgraph in Figure 10.20.27>. Its adjacency matrix and 
accompanying transition matrix are 


"0 

1 

0 

0 

1 

r 


"0 

1/2 

0 

0 

1/3 

1/4- 

0 

0 

1 

0 

1 

l 


0 

0 

1 

0 

1/3 

1/4 

1 

1 

0 

1 

0 

l 

and B = 

1 

1/2 

0 

1/2 

0 

1/4 

0 

0 

0 

0 

0 

l 


0 

0 

0 

0 

0 

1/4 

0 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0 

0 

_0 

0 

0 

1 

1 

0_ 


_0 

0 

0 

1/2 

1/3 

0 _ 


x<« = SBx (k ~ '> + — 
n 


As is common, we will choose an initial state vector in which all entries are equal: 


x 


( 0 ) _ 


-1/6- 

1/6 

1/6 

1/6 

1/6 

. 1 / 6 . 


Alice’s iterative strategy x (<:+1) = Bx (k) yields 


"0.1667" 


"0.1999" 


"0.1992" 


"0.1998" 

0.1667 


0.4043 


0.4015 


0.4002 

0.1667 

, x (5) = 

0.3930 

, x (10) = 

0.3992 

, x< 15 > = 

0.4000 

0.1667 


0.0007 


0.0000 


0.0000 

0.1667 


0.0000 


0.0000 


0.0000 

_0.1667_ 


_0.0022_ 


_0.0000_ 


_0.0000_ 
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By comparison, when Bob implements his revised iterative scheme beginning with the 
same initial page state vector but with 8 = 0.85, he obtains 


”0.1667” 


”0.1891” 


”0.1890” 


”0.1892” 

0.1667 


0.3480 


0.3464 


0.3462 

0.1667 

, x 4 (5) * = 

0.3550 

, x (10) = 

0.3576 

, x (15> = 

0.3578 

0.1667 


0.0352 


0.0350 


0.0350 

0.1667 


0.0250 


0.0250 


0.0250 

_0.1667_ 


_0.0477_ 


_0.0469_ 


_0.0469_ 


From these computations we see that whereas Alice’s scheme leads to ranks of 0 for 
Pages 4, 5, and 6, Bob’s strategy leads to more reasonable nonzero ranks for these pages. 

Mathematically, including a damping factor 8 (0 < 8 < 1) ensures that the matrix 
M is regular (Definition 2 of Section 5.5 or Definition 3 of Section 10.4) and that for any 
normalized initial state vector the iterates x (k) converge to an eigenvector with positive 
entries and which corresponds to the eigenvalue 1 (Theorem 5.5.1 or Theorem 10.4.3). 

A Final Note Although Markov chain theory had long been used in ranking nodes of networks, the 
introduction of a damping factor was the main innovation of the PageRank algorithm 
used by the Google search engine. This algorithm is named for Larry Page who, along 
with Sergey Brin, founded the Google company in the late 1990s. 


Exercise Set 10.20 

1. Without damping, find the page ranks of the following web- 
graphs of three pages by determining their normalized eigen- 
vectors for the eigenvalue 1 . 


( a ) 


( 6 ) 



◄ Figure Ex-1 


2. Show that starting with an initial state vector with equal en- 
tries in the iterative scheme x® = is equivalent to 

averaging the iterates obtained by starting with each of the 
pages in the webgraph individually. 

3. Show that if every page in a webgraph is linked to every other 
page, then all the pages have the same rank for any damping 
factor 8 in [0. 1], 

4. Show that the matrix M in Equation (7) is a transition matrix; 

that is, its entries are nonnegative and its column sums are all 

equal to 1 . Also show that M can be written as in Equation (8). 


5. Show that iteration scheme x® = Mx (k 11 in Equation (7) 
with the damping factor 8 can be written as in Equation (9). 

6. A dangling page (one with no outgoing links) can be dealt with 
by inserting virtual links to all of the pages in the webgraph, 
including itself. How does this change the adjacency matrix 
and the transition matrix for any damping factor 81 

7. Suppose a webgraph has only two pages and each page has a 
link to the other. 

(a) Without damping (Alice’s strategy), find the eigenvalues 
of the transition matrix and the eigenvector for the eigen- 
value 1. Show that if the initial page state vector is 
x (0) = [1 Of, the iteration scheme x® = Bx (k ~ l) does 
not converge. However, show that the fractional page 
count converges to [ 1 /2 1 /2 ] T . 

(b) With damping 8 in [0, 1) (Bob’s strategy), find the eigen- 
values of the transition matrix and the eigenvector for the 
eigenvalue 1 . Show that for any initial page state vector x im 
the iteration scheme x® = Mx (1_1) converges. Do this by 
finding an explicit expression for M k for k = 1,2,.... 

8. By using the fact that a matrix and its transpose have the same 
set of eigenvalues, show that any transition probability matrix 
(a square matrix with nonnegative entries, all of whose column 
sums are 1) has the eigenvalue 1. 
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In Exercises 9 -13, a series of web pages constitutes a slide show. 
Each page has one or more of the following navigation buttons, 
each of which is a link to another web page in the webgraph, as 
described: 

NEXT (go to next slide/page) 

PREVIOUS (go to previous slide/page) 

FIRST (go to first slide/page) 

Draw the directed graph for each example and construct its tran- 
sition matrix, assuming no damping. Then find the normalized 
eigenvector corresponding to the eigenvalue 1 , which determines 
the ranks of the slides. 

9. Slide show with 4 slides in which 

Slide 1 contains a NEXT button 

Slide 2 contains NEXT and PREVIOUS buttons 

Slide 3 contains NEXT, PREVIOUS, and FIRST buttons 

Slide 4 contains PREVIOUS and FIRST buttons 

10. Slide show with n slides in which 

Slide 1 contains a NEXT button 

Slides 2 to n — 1 contain NEXT and FIRST buttons 

Slide n contains a FIRST button 

[Note: The transition matrix is of the form of a Leslie matrix, 
used in models of population growth. See Section 10.16.] 


11. Slide show with n slides in which 

Slide 1 contains a NEXT button 

Slides 2 to n — 1 contain PREVIOUS and NEXT buttons 
Slide n contains a PREVIOUS button 

12. Slide show with 5 slides in which 

Slide 1 contains a NEXT button 

Slide 2 contains NEXT and PREVIOUS buttons 

Slide 3 contains NEXT, PREVIOUS, and FIRST buttons 

Slide 4 contains NEXT, PREVIOUS, and FIRST buttons 

Slide 5 contains PREVIOUS and FIRST buttons 

13. Slide show with 4 slides in which 

Slide 1 contains a NEXT button 
Slide 2 contains NEXT and PREVIOUS buttons 
Slide 3 contains NEXT and PREVIOUS buttons 
Slide 4 contains PREVIOUS and FIRST buttons 



APPENDIX A WORKING WITH PROOFS 


Linear algebra is different from other mathematics courses that you may encounter in that it 
is more than a collection of problem-solving techniques. Even if you learn to solve all of the 
computational problems in this text, you will have fallen short in your mastery of the subject. 
This is because innovative uses of linear algebra typically require new techniques based on an 
understanding of its theorems, their interrelationships, and their proofs. While it is impossible 
to teach you everything you will need to do proofs, this appendix will provide some guidelines 
that may help. 


What Is a Proof? In essence, a proof is a “convincing argument” that justifies the truth of a mathematical 
statement. Although what may be convincing to one person may not be convincing to 
another, experience has led mathematicians to establish clear standards on what is to be 
considered an acceptable proof and what is not. We will try to explain here some of the 
logical steps required of an acceptable proof. 

Formality In high-school geometry you may have been asked to prove theorems by formally listing 
statements on the left and justifications on the right. That level of formality is not 
required in linear algebra. Rather, a proof need only be an argument, written in complete 
sentences, that leads step by step to a logical conclusion, and in which each step is justified 
by referencing some statement whose validity is either self-evident or has been previously 
proved. 

How to ReadTheorems Most theorems are of the form 

If H is true, then C is true. ( 1 ) 

where H is a statement called the hypothesis and C is a statement called the conclusion. 
In formal logic one denotes a theorem of this form as 

H^C ( 2 ) 

which is read, “H implies C.” A statement of this type is considered to be true if the 
conclusion C is true in all cases where the hypothesis H is true, and it is considered to be 
false if there is at least one case where H is true and C is false. As an example, consider 
the statement 

If a and b are both positive numbers, then ab is a positive number. (3) 

In this statement, 

H — a and b are both positive numbers (4) 

C = ab is a positive number (5) 

Statement (3) is true because C is true in all cases where H is true. On the other hand, 
the statement 

If a and b are positive integers, then \[ab is a positive integer. ( 6 ) 

is not true because there exist cases where the hypothesis is true and the conclusion is 
false — for example, if a = 2 and b = 3. 

Sometimes it is desirable to phrase statements in a negative way. For example, state- 
ment (3) can be rephrased equivalently as 

If ab is not a positive number, then a and b are not both positive numbers. (7) 
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If we write ~H to mean that H is false and to mean that C is false, then the structure 

of statement (7) is 

~C =>• ~Z/ (8) 

This is called the contrapositive form of (2). It can be shown that a statement and its 
contrapositive are logically equivalent; that is, if the statement is true, then so is its 
contrapositive and vice versa. 

The converse of a theorem is the statement that results when the hypothesis and con- 
clusion are interchanged. Thus, the converse of the statement H =>■ C is the statement 
C =>■ H. Whereas the contrapositive of a true statement must itself be true, the con- 
verse of a true statement may or may not be true. For example, the converse of the true 
statement (3) is the false statement 

If ab is a positive number, then a and b are both positive numbers. 
whereas the converse of the true statement 

If the numbers a and b are both positive or both negative, then 
ab is a positive number. 

is a true statement. 


WARNING Do not confuse the terms “contrapositive” and “converse.” 

In those special cases where a statement H =>■ C and its converse C => H are both 
true, we say that H and C are equivalent statements. We denote this by writing 

H^C (9) 

which is read, “H is equivalent to C” or, more commonly, “H is true if and only if C is 
true.” For example, if a and b are real numbers, then 

a > b if and only if (a — b) > 0 (10) 

To prove an “if and only if” statement of form (9), you must prove both H =>■ C and 
C ^ H. 

Equivalent statements are often phrased in other ways. For example, statement (10) 
might also be expressed as 

If a > b, then (a — b) > 0 and conversely. 

Sometimes two true statements will give you a third true statement for free. Specif- 
ically, if it is true that H =>■ C and C => D. then it follows that H =>■ D must also be 
true. For example, consider the following two theorems from geometry. 


If opposite sides of a quadrilateral are parallel, then the quadrilateral is a 

parallelogram. 


Opposite sides of a parallelogram have equal lengths. 


Because the conclusion of the first theorem is essentially the hypothesis of the second, 
the two theorems together yield the following third theorem. 


IEM 3 If opposite sides of a quadrilateral are parallel, then they have equal 
lengths. 
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H 



D • < • C 

▲ Figure A.1 


To take this idea a step further, three true statements can sometimes yield three other 
true statements for free. Specifically, if 

H^C, C^D, D^H ( 11 ) 

then we have the implication loop in Figure A. 1 , from which we see that 

C =>• H, D C, //=>£> 

By combining this result with (1 1) we obtain 

H C, C AA D, D ^ H (12) 

In summary, if you want to prove the three equivalences in (12) you need only prove the 
three implications in (11). 


Reductio ad Absurdum It is a matter of logic that a statement cannot be both true and false. This fact is the 

basis for a method of proof, called “reductio ad absurdum” or, more commonly, “proof 
by contradiction,” the idea of which is to make the assumption that the conclusion of a 
statement is false and show that this leads to a contradiction of some sort. The underlying 
logic is that if H =>■ C is a true statement, then the statement 

(H and ~C) =>• C 

must be false, for otherwise C would be both true and false. 


Sets Many of the proofs in this text are concerned with sets (or collections) of objects, the 
objects being called the elements of the set. Although a set can generally include any kinds 
of objects, in linear algebra the objects are typically “scalars,” “matrices,” or “vectors” 
(terms that are all defined in the text). We assume that you are already familiar with the 
basic terminology and notation of sets, but we will review it quickly here. 

Sets are generally denoted by capital letters and their elements by lowercase letters. 
One way to describe a set is to simply list its elements enclosed by braces; for example, 

5 = {1,3, 5} (13) 

By agreement, the elements of a set must all he different, and the order in which the elements 
are listed does not matter. Thus, for example, the above set might also be written as 

S = {3,5,1} or S = {5,1,3} 

To indicate that an element a is a member of a set S we write a e S (read, “a belongs 
to .S'”), and to indicate that a is not a member of S we write a £ S (read, “a does not 
belong to S”). Thus, for the set in ( 13) we have 

3 e S and 4 f S 

There are two common ways of denoting sets with infinitely many elements: If the 
elements have some obvious notational pattern, then the set can be denoted by explicitly 
specifying some initial elements and using dots to indicate that the remaining elements 
follow the same pattern. For example, the set of positive integers might be denoted as 

S = {1,2, 3,...} (14) 

An alternative method for denoting the set S in (14) is to write 

S = {x: x is a positive integer} 

where the right side is read, ”the set of all x such that x is a positive integer.” This is 
called set-builder notation. In general, set-builder notation has the form 

S = {x: } 


(15) 
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Operations on Sets 


Ordered Sets 


How to Do Proofs 


where the blank line is replaced by a description that defines those and only those elements 
in the set S. Of particular interest in this text are the set of real numbers, denoted by R, 
the set of points in the plane, denoted by R 2 , and the set of points in three-dimensional 
space, denoted by R 3 . The latter two can be described in set-builder notation as 

R 2 = {(x, y): A’, y e R} and R 3 = {(x, y, z): x, y, z e /?} 

If A and B are arbitrary sets, then the union of A and B. denoted by A U B, is the set 
of elements that belong to A or 5 or both; and the intersection of A and B. denoted 
by A fl B, is the set of elements that belong to both A and B. These operations are 
illustrated in Figure A. 2 using Venn Diagrams , named for the British logician John A. 
Venn (1834-1923). In those diagrams the sets A and B are the regions enclosed by the 
circles, and the sets AO B and A fl B are shaded. In the event that the sets A and B have 
no common elements, then we say that the sets are disjoint and we write A n It — 0, 
where the symbol 0 denotes a set with no elements called the empty set. 




If every element of a set A belongs as well to a set B, then we say that A is a subset 
of B and we write A C B. If A C B and B C A, then A and B have exactly the same 
elements, so we say that A and B are equal and we write A — B. 

In certain linear algebra problems the order in which elements are listed is important, 
so we will want to consider ordered sets, that is, sets in which duplicate elements are not 
allowed but order matters. Thus, for example, 

Si = {3,5,1} and S 2 = {5, 1, 3} 

are the same sets, but not the same ordered sets. 

A good first step in a proof is to write down in complete sentences what is given (i.e., 
the hypothesis H) and what is to be proved (i.e., the conclusion C). 

Once you clearly understand what is given and what is to be proved, you must decide 
whether you want to prove the theorem directly, or in contrapositive form, or by 
reductio ad absurdum. You might restate the theorem in the three ways and see which 
form seems most promising. 

Next, you might want to review earlier theorems that could be relevant to your proof. 
From this point on it is a matter of experience and intuition, but keep in mind that 
proving theorems is not an easy task, so don’t be discouraged. As you read through 
the proofs in the text, observe the techniques and try to make them part of your own 
repertoire. 


APPENDIX B COMPLEX NUMBERS 


Complex Numbers 


Complex numbers arise naturally in the course of solving polynomial equations. For example, 
the solutions of the quadratic equation ax 2 + bx + c = 0, which are given by the quadratic 
formula 

—b ± fb 1 — 4a c 


are complex numbers if the expression inside the radical is negative. In this appendix we will 
review some of the basic ideas about complex numbers that are used in this text. 


To deal with the problem that the equation x 2 = — 1 has no real solutions, mathemati- 
cians of the eighteenth century invented the “imaginary” number 

i = V^T 

which is assumed to have the property 

i 2 = (V^T) 2 = -1 

but which otherwise has the algebraic properties of a real number. An expression of the 
form 

a + bi or a + ib 

in which a and b are real numbers is called a complex number. Sometimes it will be 
convenient to use a single letter, typically z, to denote a complex number, in which case 
we write 

z — a + bi or z — a + ib 

The number a is called the real part of z and is denoted by Re(z), and the number b is 
called the imaginary part of z and is denoted by Im(z). Thus, 

Re(3 + 2 i) = 3, Im(3 + 2 i) = 2 

Re(l — 5/) = 1, Im(l — 5 i) — Im(l + (—5 )i) = —5 

Re(7i) = Re(0 + li) = 0, Im(7i) = lm(0 + li) = 1 

Re (4) = Re (4 + 0i) = 4, Im(4) = Im(4 + Oi) = 0 

Two complex numbers are considered equal if and only if their real parts are equal and 
their imaginary parts are equal; that is, 

a + bi = c + di if and only if a = c and b = cl 

A complex number z = bi whose real part is zero is said to be pure imaginary. A complex 
number z = a whose imaginary part is zero is a real number, so the real numbers can be 
viewed as a subset of the complex numbers. 

Complex numbers are added, subtracted, and multiplied in accordance with the 
standard rules of algebra but with i 2 — —1: 

(a + bi) + (c + di) = (a + c) + (b + d)i 
(a + bi) — (c + di) = (a — c) + (b — d)i 
{a + bi)(c + di) = (ac — bd) + ( ad + bc)i 

Multiplication formula (3) is obtained by expanding the left side and using the 
i 2 = —1. Also note that if b = 0, then the multiplication formula simplifies to 

a(c + di) — ac + adi (4) 

The set of complex numbers with these operations is commonly denoted by the symbol 
C and is called the complex number system. 
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( 1 ) 

( 2 ) 

( 3 ) 

fact that 
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Complex Numbers 


The Complex Plane 



▲ Figure B.4 


► EXAMPLE 1 Multiplying Complex Numbers 

As a practical matter, it is usually more convenient to compute products of complex 
numbers by expansion, rather than substituting in (3). For example, 

(3 - 2/)(4 + 50 = 12 + 151 - 81 - 10i 2 = (12 + 10) + 71 = 22 + 71 ◄ 


A complex number z = a + bi can be associated with the ordered pair (a, b) of real 
numbers and represented geometrically by a point or a vector in the xy-plane (Figure 
B.l). We call this the complex plane. Points on the x-axis have an imaginary part of zero 
and hence correspond to real numbers, whereas points on the y-axis have a real part of 
zero and correspond to pure imaginary numbers. Accordingly, we call the x-axis the real 
axis and the y-axis the imaginary axis (Figure B.2). 


Imaginary axis 



<y 


Z — a + bi 

b 

a + hi 

-9 b 

a + bi (Imaginary b 





i part of z) 

/ 1 


1 

1 

Real axis 


1 a; 

/ 1 X 

a 


a 

a 

(Real part of z) 


▲ Figure B.l 


▲ Figure B.2 


Complex numbers can be added, subtracted, or multiplied by real numbers geo- 
metrically by performing these operations on their associated vectors (Figure B.3, for 
example). In this sense the complex number system C is closely related to R 2 , the main 
difference being that complex numbers can be multiplied to produce other complex num- 
bers, whereas there is no multiplication operation on R 1 that produces other vectors in 
R 2 (the dot product defined in Section 3.2 produces a scalar, not a vector in R 2 ). 


► Figure B.3 



The sum of two 
complex numbers 



The difference of two 
complex numbers 


If z = a + bi is a complex number, then the complex conjugate of z, or more simply, 
the conjugate of z, is denoted by z (read, “z bar”) and is defined by 

Z = a — bi (5) 

Numerically, z. is obtained from z. by reversing the sign of the imaginary part, and 
geometrically it is obtained by reflecting the vector for z about the real axis (Figure B.4). 


Complex Numbers A7 


Z = a + bi 



a 


|z| = sj a 2 + b 2 


▲ Figure B.5 


Reciprocals and Division 


► EXAMPLE 2 Some Complex Conjugates 


z 

z 

z 

z 


3 + 4/ 2 = 3 — 4/ 

—2 — 5/ z — — 2 + 5/ 

/ z = — / 

7 z = 7 ◄ 


Remark The last computation in this example illustrates the fact that a real number is equal to 
its complex conjugate. More generally, z = 5 if and only if z is a real number. 

The following computation shows that the product of a complex number z — a + bi 
and its conjugate z = a — bi is a nonnegative real number: 

zz = (a + bi)(a — bi) = a 2 — abi + bai — b 2 i 2 = a 2 + b 2 (6) 

You will recognize that 

Vzk = s/a 2 + b 2 

is the length of the vector corresponding to z (Figure B.5); we call this length the modulus 
(or absolute value of z) and denote it by |z| Thus, 

|z| = Vzz = s/a 1 + b 2 (7) 

Note that if b — 0, then z = a is a real number and |z| = s/a 2 = |a|, which tells us that 
the modulus of a real number is the same as its absolute value. 


► EXAMPLE 3 Some Modulus Computations 

2 = 3 + 4/ 1 2 1 =V3 2 + 4 2 = 5 

2 = -4-5/ |z| = V(-4) 2 + (-5) 2 = V4l 

z = / |z| = VO 2 + l 2 = 1 M 


If 2 ^ 0. then the reciprocal (or multiplicative inverse) of z is denoted by 1/z ( or z 1 ) 
and is defined by the property 



This equation has a unique solution for 1 /z, which we can obtain by multiplying both 
sides by z and using the fact that zz = |z| 2 [see (7)]. This yields 


1 

z 



( 8 ) 


If 2 2 ^4 0, then the quotient z\/zi is defined to be the product of z\ and l/z 2 . This 
yields the formula 


ZA_ 

Z2 


k 2 | 2 |Z2| 2 


( 9 ) 


Observe that the expression on the right side of (9) results if the numerator and 
denominator of zi/z 2 are multiplied by z 2 . As a practical matter, this is often the best 
way to perform divisions of complex numbers. 
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► EXAMPLE 4 Division of Complex Numbers 

Let zi = 3 + 4 i and zi= 1 — 2 /'. Express Z1/Z2 in the form a + bi . 

Solution We will multiply the numerator and denominator of zi/zi by z 2 = 1 + 2 i. 
This yields 

Zi Z 1 Z 2 3 + 4 / 1 + 2 / 
zi Z 2 Z 2 1 — 2 / 1 + 2 / 

3 + 6 / + 4 / + 8 /~ 

“ 1 - 4/ 2 

_ -5 + 10/ 

5 

= - 1 + 2 / ◄ 


The following theorems list some useful properties of the modulus and conjugate 
operations. 


:EM 1 The following results hold for any complex numbers z, Z\, and z 2 . 

(a) Zf+~Z 2 = Zl+Z2 

(b) z\ - Z2 = Z\~Z2 

(c) Z1Z2 = Z1Z2 

(d) z\/zi = Z1/I2 

(e) z = z 


1EM 2 The following results hold for any complex numbers z, Zi, and Z2- 
(a) \z\ = \z\ 

C b ) IZ1Z2I = killZ2l 

(C) \Zi/Z 2 \ = \Zt\/\Z 2 \ 

(d) \zi +Z2l < kll + \Z2\ 


Polar Form of a Complex 
Number 


n 


, (a, b) 

\b = |z| sin 1 


▲ Figure B.6 


If z — a + bi is a nonzero complex number, and if f is an angle from the real axis to 
the vector z„ then, as suggested in Figure B.6, the real and imaginary parts of z can be 
expressed as 

a = |z|cos</> and fo=|z|sin(/) (10) 

Thus, the complex number z = a + bi can be expressed as 

Z = |z|(cos0 + / sin 0) (11) 

which is called a polar form of z. The angle (j> in this formula is called an argument of z. 
The argument of z is not unique because we can add or subtract any multiple of 2 tt to it 
to obtain a different argument of z. However, there is only one argument whose radian 
measure satisfies 

— 7T < f <Tt 

This is called the principal argument of z. 


( 12 ) 
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► EXAMPLES Polar Form of a Complex Number 

Express z = 1 — >/3 i in polar form using the principal argument. 
Solution The modulus of z is 



▲ Figure B.7 


| z | = ; /P + (-y3) 2 = V4 = 2 
Thus, it follows from (10) with a = 1 and b = — that 

1=2 cos 0 and — V3 = 2 sin 0 

and this implies that 

1 V3 

cos 0 = - and sin 0 = 

2 2 

The unique angle 0 that satisfies these equations and whose radian measure satisfies (12) 
is 0 = —n/ 3 (Figure B.7). Thus, a polar form of z is 

z = 2 (cos(-|) + i sin (-y)) = 2 (cos y “ i sin y) ^ 


Geometric Interpretation of We will now show how polar forms of complex numbers provide geometric interpreta- 
Multiplication and Division tions of multiplication and division. Let 

of Complex Numbers 

ZI = |zi|(cos</>! + i sin 01 ) and zi = I Z2 1 (cos 0 2 + i sin0 2 ) 



be polar forms of the nonzero complex numbers z i and zi- Multiplying, we obtain 

Z1Z2 = |zi I IZ2 1 [(cos cos 02 — sin 0i sin0 2 ) + r (sin 0! cos0 2 + cos0i sin0 2 )] 

Now applying the trigonometric identities 

cos(0i + 0 2 ) = cos 0i cos 0 2 — sin 0i sin 02 
sin(0i + 02 ) = sin 0i cos0 2 + cos0i sin0 2 


yields 

Z1Z2 = |zil|z 2 |[cos(0i +02) + / sin(0i +02)] (13) 

which is a polar form of the complex number that has modulus |zi||z 2 | and argument 
0i +02- Thus, we have shown that multiplying two complex numbers has the geometric 
effect of multiplying their moduli and adding their arguments (Figure B.8). 

Similar kinds of computations show that 


— = j— i-|[cos(0i - 02 ) + i sin(0i - 0 2 )] (14) 

Z2 K2 1 


which tells us that dividing complex numbers has the geometric effect of dividing their 
moduli and subtracting their arguments (each in the appropriate order). 


^ EXAMPLE 6 Multiplying and Dividing in Polar Form 

Use polar forms of the complex numbers zi = 1 + \/3 i and zi = -s/3 + i to compute 

Z1Z2 and zi/z 2 . 
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▲ Figure B.9 


DeMoivre's Formula 


Euler's Formula 


Tt 

■ i sin — 
6 


) 


Solution Polar forms of these complex numbers are 

z i = 2 ( cos — + i sin — ) and zi = 2 ( cos — 

V 3 3 / V 6 

(verify). Thus, it follows from (13) that 

[ / Tt 7T\ (Tt TT \ "1 [ /7T\ / Tt \ 

COS ( 3 + V + ' “ 1 3 + V\ = 4 l C ° S \ 2> + ‘ Sm ( 2 ) 


= 4 i 


and from (14) that 



r i n 


• ■ ( n 

7T\1 

(T 

r \ ■ ■ ( n \ 

Z1 1 

cos — - 


+ 1 sin — 

- T 

— cos ( - 

7 + 1 sin - 

Z2 

L v 3 

6 V 

V 3 

6 /J 

V 

5/ V 6/ 


73 


As a check, let us calculate Z 1 Z 2 and z\/zi directly: 


ZiZ2 = (1 + 730(73 + 0 = 73 + i + 3 i + 73 i 2 = 4 i 

zi _ 1 + 73) _ 1 + 73i 73 - i _ 73 - i + 31 - 73 1 2 _ 273 + 21 _ 73 1 . 

Z 2 73 + 1 73 + 1 73-1 3 -l 2 4 2 + 2 l 

which agrees with the results obtained using polar forms. 


Remark The complex number 1 has a modulus of 1 and a principal argument of jr/2. Thus, 
if z is a complex number, then iz has the same modulus as z but its argument is greater by jr/2 
(= 90°); that is, multiplication by 1 has the geometric effect of rotating the vector z counterclock- 
wise by 90° (Figure B.9). 


If n is a positive integer, and if z is a nonzero complex number with polar form 

z — |z| (cos 0 + 1 sin0) 
then raising z to the nth power yields 

z" = z - z z = |z|"[cos(0 + </> H h (/>)] + 1 [sin (0 + </> H (-</>)] 

11 factors it terms it terms 

which we can write more succinctly as 

z n = \z\ n (cos n(p + i sin nrj)) (15) 

In the special case where |z| = 1 this formula simplifies to 

Z n = cosncp + i sin n<p 
which, using the polar form for z, becomes 

(cos (p + i sin (/>)" = cos tup + i sin tup (16) 

This result is called DeMoivre's formula , named for the French mathematician Abraham 
de Moivre (1667-1754). 

If 0 is a real number, say the radian measure of some angle, then the complex exponential 
function e‘ e is defined to be 

e‘ 8 = cos# + 1 sin# (17) 

which is sometimes called Euler’s formula , named for the Swiss mathematician Leonhard 
Euler (1707-1 783). One motivation for this formula comes from the Maclaurin series in 
calculus. Readers who have studied infinite series in calculus can deduce (17) by formally 
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substituting id for x in the Maclaurin series for e x and writing 

(id) 2 (id) 2 (id) 4 (id) 5 (id) 6 


= 1 + id 


2 ! 


4! 


3! 

d 2 . d 3 d 4 d 5 d 6 

= 1 -\- id — — — i — 4“ — i — — — 

2! 3! 4! 5! 6! 


5! 


6 ! 


/ d 2 

d 4 

d 6 

\ ■ (n 03 

d 5 \ 

1 r 

“h 

— Ti 1” ’ ’ 

•)-(“/ Id — — “ 

+ — 1 

V 2! 

4! 

6! 

/ V 3! 

5! ) 


= cos d + i sin f 


where the last step follows from the Maclaurin series for cos d and sin d. 

If z = a + bi is any complex number, then the complex exponential e z is defined to 
be 

e z = e a+bl = e a e ,b — e fl (cos b + i sin/?) (18) 

It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, 
for example, 

e z ' 1 

— gZl~hZ2 — gZ\ Z2 — ^ Z 

e Z2 ’ qZ 


MISSING 



ANSWERS TO EXERCISES 


Exercise Set 1.1 (page 8) 

1 (a), (c), and (f) are linear equations; (b), (d), and (e) are not linear equations 

(a) O H-Tj -1“ d\iX 2 = b\ (b) 0 \\X\ + Cl 12 X 2 “1“ flj3*3 = (c) d\\X\ -1“ d\ 2 X 2 + £Zl3*3 + 014 X 4 = b\ 

#21*1 + # 22*2 = bi Cl 2 ] -V | + CI 22 X 2 + 022X2 = + d 2 lX\ + Cl 22 X 2 + 022X2 + #24*4 = &2 

O21X1 + 022X2 + 022X2 — bi 


5. (a) 2xi = 0 (b) 3*i — 2x2 = 5 

3*1 — 4*2 = 0 7*i + *2 + 4*3 = — 3 

*2 = 1 — 2*2 + *3 = 7 



'-2 6 ' 


'6 -1 3 4 


0 2 0 -3 1 O' 

(a) 

3 8 

9 -3 

(b) 

0 5-1 1_ 

(b) 

-3-1 1 0 0-1 

6 2-1 2-3 6 


9. (a), (d), and (e) are solutions; (b) and (c) are not solutions 


11 (a) No points of intersection 

(b) Infinitely many points of intersection: * = ( + 2 1, y = t 

(c) One point of intersection: (—8,-4) 

(a) * = | + y = t 

(b) *i = | + | r - f + * 2 = r, *3 = s 

(c) *i = — | + \r — §s + \t, * 2 = r, *3 = s, * 4 = t 

(d) V = |fi — 1 1 2 + |?3 — |? 4 , W = fi, * = t 2 , y = t 2 , Z = ?4 

(a) * = \ + |f, y = t 

(b) *i = —4 — 3r + s, *2 = *3 = s 

(a) Add 2 times the second row to the first row. 

(b) Add the third row to the first row, or interchange the first row and the third row. 

19. (a) All values of k ^ 2 2* + 3y + z = 7 27. * + y + z = 12 

(b) All values of k 2* + y + 3z = 9 2* + y + 2z = 5 

4* + 2y + 5z = 16 — * + z = 1 

True/False 1.1 

(a) True (b) False (c) True (d) True (e) False (f) False (g) True (h) False 


Exercise Set 1.2 (page 22) 

(a) Both (b) Both (c) Both (d) Both (e) Both (f) Both (g) Row echelon form 

(a) * = —37, y = — 8 , z = 5 

(b) w = — 10 + 13f, * = — 5 + 13f, y = 2 — t, z = t 

(c) *i = — 11 — 7 j + 2 1, *2 = *3 = —4 — 3z, *4 = 9 — 3t, * 5 = t 

(d) No solution 

5 *i = 3, *2 = 1, *3 = 2 7, * = — 1 + t, y = 2s, z = s, w = f 9. *i = 3, *2 = 1, *3 = 2 

11 x = —l + t,y = 2s,z = s,w = t Has nontrivial solutions 1 : *1 = 0, *2 = 0, * 3 = 0 

r *1 = — *2 = — — t, *3 = s, *4 = r w = t, x — — t, y = t, z = 0 /1 = — 1 , / 2 = 0 , / 3 = 1 , / 4 = 2 . 

(a) Consistent; unique solution 

(b) Consistent; infinitely many solutions 

(c) Inconsistent 

(d) Insufficient information provided 


A13 



A14 Answers to Exercises 


No solutions when a = —4; —a + b + c = 0 29 x = |a — y = — [a + 

infinitely many solutions when a = 4; 

one solution for all values a ^ —4 and a ^ 4 


31. E.g., 


1 3 
0 1 


and 


1 0 
0 1 


(other answers are possible) x = ±1, y = ±V3, z = ±-s/ 2 


3 a = 1, = —6, c = 2, d = 10 The nonhomogeneous system has only one solution. 


True/False 1.2 

(a) True (b) False (c) False (d) True (e) True (f) False (g) True (h) False (i) False 


Exercise Set 1.3 (page 36) 

(a) Undefined (b) Defined; 4x4 matrix (c) Defined; 4x2 matrix 

(d) Defined; 5x2 matrix (e) Defined; 4x5 matrix (f) Defined; 5x5 matrix 

-7 -28 

-21 -7 



7 

6 

5' 


'-5 

4 

-l" 


' 15 

o' 

r 

(a) 

-2 

1 

3 

(b) 

0 

-1 

-1 

(c) 

-5 

10 

(d) 


7 

3 

7 


-1 

1 

1 


5 

5 

- 


-14 

-35 


(e) Undefined 
(0 5 G) -25 


(0 


' 22 

-6 

8' 


'-39 

-21 

-24' 


"o 

o' 

-2 

4 

6 

(g) 

9 

-6 

-15 

(h) 

0 

0 

10 

0 

4 


-33 

-12 

-30 


0 

0 


(k) 168 (1) Undefined 



' 12 

-3" 


'42 

108 

75' 


' 3 

45 

9' 


' 3 

45 

9' 

(a) 

-4 

5 

(b) Undefined (c) 

12 

-3 

21 

(d) 

11 

-11 

17 

(e) 

11 

-11 

17 


4 

1 


36 

78 

63 


7 

17 

13 


7 

17 

13 


(f) 







12 

6 

9' 

21 

17’ 


'0-2 11" 





17 

35 

(g) 

12 1 8 

(h) 

48 

-20 

14 






24 

8 

16 


(i) 61 0) 35 (k) 28 (I) 99 



"41" 


" 6" 


'76' 

7. (a) [67 41 41] (b) [63 67 57] (c) 

21 

(d) 

6 

(e) [24 56 97] (f) 

98 


67 


63 


97 



'3' 


"-2" 


"7" 


"6" 


"-2" 


"4' 

(a) first column of AA = 3 

6 

+ 6 

5 

+ 0 

4 

(b) first column of BB = 6 

0 

+ 0 

1 

+ 7 

3 


0 


4 


9 


7 


7 


5 



"3" 


"-2" 


"7" 


"6" 


"-2" 


"4' 

second column of AA — —2 

6 

+ 5 

5 

+ 4 

4 

second column of BB = —2 

0 

+ 1 

1 

+ 7 

3 


0 


4 


9 


7 


7 


5 



"3" 


"-2" 


"7" 


"6" 


"-2" 


"4' 

third column of AA = 7 

6 

+ 4 

5 

+ 9 

4 

third column of BB = 4 

0 

+ 3 

1 

+ 5 

3 


0 


4 


9 


7 


7 


5 


(a) A = 

"2 -3 5" 

9 -1 1 

, X = 

Xi 

x 2 

, b = 

f 

-1 


1 5 4 


x 3 


0 


2 -3 
9 -1 


-3 

0 

9 

-1 


, b = 


0 -3 

1 0 

-5 9 

3 -1 


7 

-1 

0 

1 


-1 

7 


(b) A = 


Answers to Exercises A15 


(a) 5xi + 6x 2 — 7X2 =7 (b) x + y + z = 2 

— jci — 2 x 2 + 3*3 = 0 2* + 3y = 2 

4*2 —*3 = 3 5* — 3y — 6 z = —9 


k = -1 

17. 


19. 


[0 1 2 ] + 
[1 2 ] + 


-3 

-1 


[-2 
[3 4] + 


3 1] = 

[5 6] = 


4 8 

2 4 

1 
4 


+ 


-9 -3 

-3 -1 


-5 

-1 


21 . 


J 



L J 



J 

L 

Xi 


"0" 


'-3' 


'-4' 


~-2 

*2 


0 


1 


0 


0 

*3 


0 


0 


-2 


0 

r= 

+ r 

0 

+ 5 

1 

+ t 

0 

X 4 


0 




X 5 


0 


0 


0 


1 

xe_ 


1 

L 3 J 


0 


0 


0 

— 

X,b 

= - 

-6 , c 

= -1,4 = 

1 



6 8 
15 20 


15 18 

30 36 


5 
3 

22 28 
49 64 


27 The only matrix satisfying the given condition is A = 


1 1 

1 -1 

0 0 


1 1 
1 1 


and 


-1 -1 

-1 -1 


29. (a) 

(b) Four square roots can be found: 


"V5 O’ 


~-4l o’ 


o’ 

, and 

'-V5 o" 

0 3 

’ 

0 3 

’ 

0 -3 

0 -3 


33. The matrix product represents 


the total cost of items purchased in January 
the total cost of items purchased in February 
the total cost of items purchased in March 
the total cost of items purchased in April 


True/False 1.3 

(a) True (b) False (c) False (d) False 
(1) False (m)True (n) True (o) False 


(e) True (f) False (g) False (h) True (i) True (j) True 


Exercise Set 1.4 (page 49) 


" 1 3 " 

5 20 

7. 

1 o’ 

9. 

\(e x +e~ x ) 

1 

* 

1 

1 

\> 

r 

15. 

1 1' 

17. 

9 

13 

1 “ 

13 

1 1 
- 5 10 - 


.0 5. 


J 

1 

Hi 

1 

* 

i + e ~*)_ 


1 3 

_7 7 _ 


2 

13 

6 

13 _ 


19. (a) 


41 

30 


15 

11 



-15 

41 



2 

2 


(a) 


1 

2 




7 

6 


36 

13 

26 

10 


23. The matrices commute if c = 0 and a = d. 


25. X\ 


i v _ 13 
23 ’ 2 23 


27- *i = - A > *2 = n 



'1 0‘ 


'0 r 

(a) E.g., A = 

0 0 

and B = 

0 0 


(b) (A + B)(A- B) = A (A - B) + B (A - B) = A 2 - AB + BA- B 2 

(c) AB = BA 


(k) True 


A16 Answers to Exercises 


35. No 37. Invertible; A 1 




39. B~ l 


True/False 1.4 

(a) False (b) False (c) False (d) False 


(e) False (f) True (g) True (h) True (i) False (j) True 


(k) False 


Exercise Set 1.5 (page 58) 

(a) Elementary (b) Not elementary (c) Not elementary (d) Not elementary 


3 (a) Add 3 times the second row to the first row: 




1 

0 

o' 

1 3 

0 1 

(b) Multiply the first row by — | : 

7 

0 

1 

0 


0 

0 

1 


(c) Add 5 times the first row to the third row: 


(a) Interchange the first and second rows: EA = 






"0 

0 

1 

O' 

"l 

0 

o' 


0 

1 

0 

0 

0 

1 

0 

(d) Interchange the first and third rows: 

1 

0 

0 

0 

5 

0 

1 






_0 

0 

0 

1 _ 


3 -6 -6 -6 

-1 -2 5 -1 



2 

-1 

0 

-4 

-4 

(b) Add —3 times the second row to the third row: EA = 

1 

-3 

-1 

5 

3 


-1 

9 

4 

-12 

-10 


(c) Add 4 times the third row to the first row: EA = 


13 28 

2 5 

3 6 



"o 

0 

f 


"o 

0 

l" 


1 

0 

o' 


"l 

0 

o' 

(a) 

0 

1 

0 

(b) 

0 

1 

0 

(c) 

0 

1 

0 

(d) 

0 

1 

0 


1 

0 

0 


1 

0 

0 


-2 

0 

1 


2 

0 

1 


(a) 


-7 4 

2 -1 


(b) Not invertible 


11. (a) The inverse is 


17. 


'-40 

16 

9' 



" 1 1 1 “ 

2 2 2 


“ 7 

2 

0 

-3' 

13 

-5 

-3 

(b) Not invertible 

13. 

1 1 1 

2 2 2 

15. 

-1 

l 

0 

5 

-2 

-1 



1 1 1 

- 2 2 2- 


0 

-l 

1_ 


p 1 

4 

1 

2 

-3 

O' 


ki 

0 

0 

o' 


" 1 


0 

0 

1 

8 

1 

4 

3 

2 

0 

(a) 

0 

h 

0 

0 

(b) 

0 

1 

0 

1 

0 

1 

0 

0 

1 

2 

0 

0 

0 


0 

0 

0 

k 

~k 

1 

1 

1 

1 


0 

0 

0 

J_ 


_0 

0 

0 

l 

L 40 

20 

10 

5 J 



fct- 







21. Any value of c other than 0 and 1 


A = 


"l 

-2 

"l 

o' 

"l o' 

"l 

5 " 

; A -1 = 

’l 

-5" 

'1 o' 

1 

O' 

'l 

2' 

0 

1 

2 

1 

oo 

1 

o 

0 

1 

0 

1 

.0 -i 

-2 

1 

0 

1 


(answer is not unique) 


Answers to Exercises A17 



'1 

0 

O' 


"1 

0 

0“ 


"l 

0 

-2" 


"1 

0 

2" 


"1 

0 

0" 


"l 

0 

o' 

A = 

0 

4 

0 


0 

1 

3 

4 


0 

1 

0 

; A- 1 = 

0 

1 

0 


0 

1 

3 

4 


0 

1 

4 

0 


0 

0 

1 


0 

0 

1 


0 

0 

1 


_0 

0 

1 


0 

0 

1 


_0 

0 

1 


(answer is not unique) 

27. Add — 1 times the first row to the second row; add — 1 times the second row to the first row; add — 1 times the first row to the third 
row (answer is not unique) 

True/False 1.5 

(a) False (b) True (c) True (d) True (e) True (1) True (g) False 


Exercise Set 1.6 (page 66) 

1 xi — 3,X2 = — 1 3. jti = — 1, X 2 = 4, X} — — 7 5 x = 1, y = 5, and z = — 1 Xi = 2b\ — 5b2, x 2 — — &i + 3b 2 

(i) x 1 = ==, x 2 = 1 \j (ii) Jti = f \,x 2 =Y 2 

(i) xi = J 5 , x 2 = j~s (ii) xi = || , x 2 = f§ (iii) x l = j|, x 2 = § (iv) Xi = -\,x 2 =\ 


13 The system 

is consistent for all values of b\ and b 2 . 

b\ — b2 b 3 

b\ = b 3 + b 4 and b 2 = 2 b 3 + b 4 


11 

12 

-3 

27 

26" 




II 

c\ 

-6 

-8 

1 

-18 

-17 





— !5 

-21 

9 

-38 

— 35_ 





True/False 1.6 

(a) True (b) True (c) True (d) True (e) True (f) True (g) True 


Exercise Set 1.7 (page 72) 

(a) Upper triangular and invertible 

(b) Lower triangular and not invertible 

(c) Diagonal, upper triangular, lower triangular, and invertible 

(d) Upper triangular and not invertible 



"6 

3" 


"—15 

10 

0 

20 

-20" 

















"l o" 


"l o' 


"l 0 

3. 

4 

-1 

5. 

2 

-10 

6 

0 

6 

II 

0 4 

, A~ 2 = 

0 i 

, A~ k = 

0 rb 


_4 

10 


18 

-6 

-6 

-6 

-6 




L 4 J 


L (-2) J 



- 1 

0 

o' 


"4 

0 

0 " 


" 2 * 

0 

o' 


"0 

0 

0 " 



II 

On 

4 

0 

1 

0 

, A~ 2 = 

0 

9 

0 

,A-* = 

0 

3 fc 

0 

11 . 

0 

0 

0 

13. 

1 

0 



9 













0 

-1 


_0 

0 

1 

16- 


_0 

0 

! 6 _ 


_0 

0 

4 k _ 


_0 

0 

0 _ 


L J 






sb 





'1 

3 

7 

2 " 

au 

av 


ra 

tc 















2 -f 


3 

1 

-8 

—3 

bw 

bx 

(b) 

ua 

vb 

yb 

U)C 

(a) 

-1 3 

(b) 

7 

-8 

0 

9 

_cy 

cz_ 


xa 

zc _ 




.2 

-3 

9 

0 _ 


Not invertible Invertible —3, 5, —6 a = —8 All x such that x ^ 1 , x ^ —2 , and r / 4 

1 0 O' 

0-1 0 


29. They are reciprocals of the corresponding diagonal entries of the matrix A. 31. 


0 0-1 


A18 Answers to Exercises 


(a) Symmetric (b) Not symmetric (unless n = 1) (c) Symmetric (d) Not symmetric (unless n = 1) 




0 

0 

4' 


'0 

0 

-8~ 

"l 10' 









0 -2 

(a) 

0 

0 

1 

(b) 

0 

0 

-4 

L - 


_— 4 

-1 

0_ 


_8 

4 

0_ 


True/False 1.7 

(a) True (b) False (c) False (d) True (e) True (f) False (g) False (h) True (i) True (j) False (k) False 
(1) False (m) True 


Exercise Set 1.8 (page 82) 

(a) Domain: R 2 ; codomain: R 3 

(b) Domain: R 3 \ codomain: R 2 

(c) Domain: 7? 3 ; codomain: R 3 

(d) Domain: R 6 ; codomain: R 


3. (a) Domain: R 2 ; codomain: R 2 
(b) Domain: R 2 ; codomain: R 3 


5. (a) Domain: R 3 ; codomain: R 2 
(b) Domain: R 2 ; codomain: R 3 


7. (a) Domain: R 2 ; codomain: R 2 

9 Domain: R 2 ; codomain: R 3 

(a) 

'2 -3 f 

3 5 -1 

(b) 

'7 

0 

2 

-1 

-8~ 

5 

(b) Domain: R 3 ; codomain: R 2 




_4 

7 

-1 


13. (a) 


' 0 
-1 
1 
1 


15. 


'3 5 

4 -1 

3 2 


r 

0 
3 

-1, 

-T 

1 

-1 






'0 

0 

O' 


"0 

0 

0 

1 


7 

2 

-1 

r 


0 

0 

0 


1 

0 

0 

0 

(b) 

0 

1 

1 

0 

(c) 

0 

0 

0 

(d) 

0 

0 

1 

0 


-1 

0 

0 

0_ 


0 

0 

0 


0 

1 

0 

0 





0 

0 

0 


1 

0 

-1 

0 


r (-1,2,4) = (3, -2,-3) 


'-1 l" 


"2 

-1 

r 

0 1 

■T (-1,4) =(5,4) (b) 

0 

1 

i 

0 

0 

0 


17. (a) 


(a) T a (x) = 


T (2, 1, —3) = (0, -2, 0) 


'l 2 

3' 


"-l" 

3 4 

-2 


1 



"1 

0 

4' 


"2” 

27. 

3 

0 

-3 

; t (x) = 

6 


_0 

1 

-1_ 


_1_ 


(b) T a (x) = 


29. (a + c, b + d) 


- 1 ' 

1 

3 



"-1" 


"3" 


0" 


"2" 


O' 

(a) 7 a (e i) = 

2 

4_ 

, T a (e 2 ) = 

1 

_5_ 

, T a (ej) = 

2 

3_ 

(b) 

5 

_6_ 

(c) 

14 

_— 2!_ 


2 No, unless b = 0 


True/False 1.8 

(a) False (b) False (c) True (d) False (e) True 


(f) False 


(g) False 


Answers to Exercises A19 


Exercise Set 1.9 (page 94) 


50 



3. (a) x 2 — X 3 = 100 

X 3 — X 4 = —500 

Xi — X 2 — 300 

— X\ “l - x 4 = 100 

(b) X\ = — 100 + s, X 2 = —400 + s , X 3 = —500 + s, x 4 = s 

(c) To keep the traffic flowing on all roads, the flow from A to B 
must exceed 500 vehicles per hour. 


1 1 = 2.6A , I 2 = -0.4A , h = 2.2A h = / 4 = I s = I 6 = 0.5A, h = h = 0A C 3 H 8 + 50, -»• 3CO z + 4H 2 0 

1 1 CH 3 COF + H 2 0 CH3COOH + HF 2 — 2x + x 2 1 + T x - W 

(a) p (x) = 1 + (1 — t) x + tx 2 



True/False 1.9 

(a) False (b) False (c) True (d) False (e) False 


Exercise Set 1.10 


i. (a) 


0.50 

0.25 


0.25" 

0.10 


(page 100) 


(b) M must produce approximately $25,290.32 worth of mechanical work and B must produce approximately $22,580.65 worth of 
body work 



"0.10 

0.60 

0.40" 


"$31,500" 


"123.08" 

202.56 

(a) 

0.30 

0.20 

0.30 

(b) 

$26,500 

5. x & 


0.40 

0.10 

0.20 


$26,300 



True/False 1.10 

(a) False (b) True (c) False (d) True (e) True 


Chapter 1 Supplementary Exercises (page 101) 


1. 3*1 — X2 + 4*4 = 1 

2jci + 3x3 + 3 x 4 = — 1 

Jtl = — Is — h — X 2 = — — \t — X 3 = S, X4 = t 


5. X' = \x + \y, y' = -*x+\ y 7. x = 4, y = 2, z = 3 

(a) a ^ 0 and b ^ 2 (b) a ^ 0 and b = 2 (c) a = 0 at 


2xi — 4x 2 + x 3 = 6 

— 4xi + 3x 3 = — 1 

x 2 - x 3 = 3 


Xi = 


11 . 


13. (a) 


-1 

6 


3 -1 
0 1 


(b) 


1 -2 
3 1 


(c) 


= 2 

(d) a 

113 

160 ~ 

37 

37 

20 

46 

37 

37 - 


x 2 = — T > *3 = — - 


a = 1, b = -2, c = 3 


A20 Answers to Exercises 


Exercise Set 2.1 (page ill) 


(a) M 13 = 0, C B = 0 

(b) M 23 = -96, C 23 = 96 

(c) M 22 = -48, C 22 = -48 

(d) M 21 = 72, C 2 , = -72 


- 2 

- 5 “ 


~ -2 

- 7 “ 

11 

22 

7. 59; 

59 

59 

1 

3 

7 

-5 

- 11 

22 - 


- 59 

59 - 


1. Mn= 29, Cn= 29 3. (a) M B = 0, Cb = 0 5. 22; 

M u = 21, C 12 = -21 
Mb = 27, Cb = 27 
M 2 i = — 11, C 2 i = 11 
M 22 = 13, C 22 = 13 
M 23 = —5, C 23 = 5 
M 31 = -19, C 3 i = -19 
M 32 = —19, C 32 = 19 
M 33 = 19, C 33 = 19 

a 2 — 5a+21 -65 13.-123 A. = -3orA = l A.= lorA = -l i (all parts) - 123 21.-40 

0 —240 —1 0 6 (a) The determinant is 1 . (b) The determinant is 1 . d[ + X = d 2 

37 If n = 1 then the determinant is 1. If n > 2 then the determinant is 0. 


True/False 2.1 

(a) False (b) False (c) True (d) True (e) True (f) True (g) False (h) False (i) False (j) True 


Exercise Set 2.2 (page 117) 

5. -5 7. -1 9. 33 11. 6 13. -2 15. -6 17. 72 19. -6 

21 18 -24 33 det(S) = (— 1) L,,/2J det(A) 


True/False 2.2 

(a) True (b) True (c) False (d) False (e) True (f) True 


Exercise Set 2.3 (page 127) 

5. det(A + 6) 7^ det(A) + det(B) 7. Invertible 9. Invertible 11 Not invertible 13, Invertible 


k jz and k £ k £ -1 Invertible; A~ l = 


3 -5 -5 

-3 4 5 

2 -2 -3 


21. Invertible; A 1 = 


0 1 


0 0 


Invertible; A 1 = 


5. (a) 189 (b) f (c) § (d) £ 


‘-4 

3 

o -r 

2 

-1 

0 0 

-7 

0 

-1 8 

6 

0 

1 -7. 

not apply. 

31. y = 


v 3 2 1 11 y 30 38 40 

ZS. X — n, y — TT> z — — IT x l — — IT’ X2 — — TT> x 3 ~ ~ IT 


True/False 2.3 

(a) False (b) False (c) True (d) False (e) True (f) True (g) True (h) True (i) True (j) True (k) True 
(1) False 


Answers to Exercises A21 


Chapter 2 Supplementary Exercises (page 129) 

1. —18 3. 24 5. —10 7. 329 9 Exercise 3: 24; Exercise 4: 0; Exercise 5: —10; Exercise 6: —48 


11. The matrices in Exercises 1-3 are invertible; the matrix in Exercise 4 is not. 


17. 


13. 


- b 1 + 56-21 


15. -120 













r 10 

2 

52 

27 1 




r 1 

1 

3 n 


r 1 

2 

1 “I 


329 

329 

329 

329 

r 1 

in 


8 

8 

8 


5 

5 

10 


55 

11 

43 

16 

6 

9 

19. 




21. 




23. 

329 

329 

329 

329 













1 

2 

8 

24 

24 

5 

5 

5 

3 

10 

25 

6 

L 6 

9J 


1 

7 

1 


2 

6 

3 


47 

47 

47 

47 




L 4 

12 

12 J 


L 5 

5 

10 J 


31 

72 

102 

15 












L- 329 

329 

329 

329-1 

x' = 

fx + fy, y = 

\y 

- \x 

29. (b) cos ji 

a 2 +c 2 - 

b 2 

cos y 

a 2 W-c 





2 ac 


lab 






Exercise Set 3.1 (page 140) 

1. (a) (3, -4) (b) (2, -3, 4) 3. (a) (-1, 3) (b) (-3, 6, 1) 5. (a) (2, 3) (b) (-2, -2, -1) 

(a) (—1,2, —4) is one possible answer (b) (7, —2, —6) is one possible answer 
9. (a) (1, -4) (b) (-12, 8) (c) (38, 28) (d) (4, 29) 

(a) (-1,9, -11, 1) (b) (-13, 13, -36, -2) (c) (-90, -114, 60, -36) (d) (27, 29, -27, 9) 

(— y , 7, — y , — |) (a) Not parallel to u (b) Parallel to u (c) Parallel tou a = 3, b = — 1 

C1=2,C2 = -1,C3 = 5 (a)(|,-l,-i) (b)(f,-f,i) (a) (-2, 5) (b) (3, -8) (7,-3, 

29. (a) 0 (b) 0 (c) — a 31. Magnitude of F is \/84 lb « 9.17 lb; the angle with the positive x-axis —70.9° 

33. ^ lb ~ 183.01 lb and lb - 224.14 lb 

True/False 3.1 

(a) False (b) False (c) False (d) True (e) True (f) False (g) False (h) True (i) False (j) True 


Exercise Set 3.2 (page 153) 

(a) ||v|| = 2V3; pfV = 7f) ; “M v = (“7I> “Tr “71 ) 

(b) IMI = x/15 ; ij^jfV = o, ^j=, °> _ 7H’ “TIs) 

(a) x/83 (b) X/I7+V26 (c) 2^3 (d) a/466 (a) a/ 2570 (b) 3 a/46 - 10a/ 2I + V42 (c) 2 a/%6 

k = | or k= — | (a) u • v = —8; u • u = 26; v • v = 24 (b) u • v = 0; u ■ u = 54; v • v = 21 

(a) d (u, v) = vT4; cosd=-)= ; the angle is acute 

(b) d (u, v) = v3 9; cos 9 = ^0^ ; the angle is obtuse 

(a) Does not make sense; v • w is a scalar, whereas the dot product is only defined for vectors 

(b) Makes sense 

(c) Does not make sense; u • v is a scalar, whereas the norm is only defined for vectors 

(d) Makes sense 

25. 71°, 61°, 36“ 


-19) 


(k) False 


True/False 3.2 

(a) True (b) True (c) False (d) True (e) True (1) False (g) False (h) False (i) True (j) True 


A22 Answers to Exercises 


Exercise Set 3.3 (page 162) 

(a) Orthogonal (b) Not orthogonal (c) Not orthogonal (d) Not orthogonal 


— 2 (jc+1) + ( 3 ; — 3) — (t:+2) = 0 2z = 0 Not parallel Parallel 

(a) \ (b) J= 15. (0,0), (6, 2) (-if, 0, -f§), (f|, 1, -i|) (1, 

1 -j= 25. | ((7!' 7!’ ~ 7s) * s one possible answer 


11 . Not perpendicular 

_t j_ _J_) 12 s g. 2i\ 

5’ 10’ 10 / ’ V 5 ’ 5 ’ 10 ’ 10 / 

Yes Nm ^ 


35,355 Nm 


True/False 3.3 

(a) True (b) True (c) True (d) True (e) True (f) False (g) False 


Exercise Set 3.4 (page 170) 

Vector equation: (x, y) = (—4, 1) + f (0, — 8 ); 
parametric equations: x = —4, y = 1 — 8 ? 

Vector equation: (x, y, z ) = f (— 3, 0, 1); 
parametric equations: x = —3 1, y = 0 , z = t 

5. Point: (3, — 6 ); vector: (—5, —1) 

7. Point: (4, 6 ); vector: (— 6 , — 6 ) 

Vector equation: ( x , y, z) = (—3, 1, 0) + 0 (0, —3, 6 ) + f 2 (— 5, 1, 2); 
parametric equations: x = — 3 — 5? 2 , y = 1 — 3?i + ? 2 , z = 6 ?i + 2 ? 2 

11 Vector equation: (x, y, z) = (—1, 1, 4) + ?! ( 6 , —1, 0) + f 2 (— 1, 3, 1); 
parametric equations: x = — 1 + 6 ?i — f 2 , y = 1 — ?i + 3? 2 , z = 4 + ? 2 

13. Vector equation: (x, y) = ?(3, 2); 

parametric equations: x = 3 1 and y — 2t 

If Vector equation: (x, y, z) = h (5, 0, 4) + f 2 ( 0, 1, 0); 
parametric equations: x = 5?i, y = ? 2 , and z = 4?i 

r X! = — i — t , X 2 = S , X 3 = t 

15 Xi = — yS — |f , x 2 = — + |f , x 3 = r, x 4 = 5, x 5 = ? 

(a) (x, y, z) = (1, 0, 0) + (—s - ?, 5, ?) 

(b) A plane passing through the point (1, 0, 0) and parallel to the vectors (— 1, 1,0) and (—1,0, 1). 

(a) x+ y + z = 0 (b) A straight line passing through the origin (c) x = — | t,y = —\t,z = t 

—2x + 3y =0 

(a) X! = — |j + I?, x 2 = s, x 3 = t (c) (xi, x 2 , x 3 ) = (1, 0, 1) + (— + \t, s, t ) 

27 Xi = | — |r — x 2 = r, x 3 = j, x 4 = 1; 

general solution of the associated homogeneous system: (— |r — (,s, r, ,s, 0 ); 
particular solution of the nonhomogeneous system: (), 0 , 0 . 1 ) 

29 If T(\) = 0 then the image is a single point; otherwise the image is a line. 

True/False 3.4 

(a) True (b) False (c) True (d) True (e) False (f) True 


Exercise Set 3.5 (page 179) 

1 (a) (32, - 6 , -4) (b) (-32, 6 , 4) (c) (52, -29, 10) (d) 0 (e) (0, 0, 0) (f) (0, 0, 0) 

||u x w|| 2 = 1125 u x (v x w) = (-14, -20, -82) u x v = (18, 36, -18) -s/59 3 


Answers to Exercises A23 


13. 7 15. 17. 16 19. The vectors do not lie in the same plane. 21. —92 23. abc 25. (a) —3 (b) 3 (c) 3 

(a) A (b) & 2 2(v x u) (a) 1500^2 Nm « 2121.32 Nm (b) 132°, 109°, 132° (a) f (b) \ 

True/False 3.5 

(a) True (b) True (c) False (d) True (e) False (f) False 


Chapter 3 Supplementary Exercises (page 181) 

(a) (13, -3, 10) (b) VTO (c) 3^ (d) (-§, f , f ) (e) -122 (f) (-3150, -2430, 1170) 

(a) (-5,-12, 20,-2) (b) >/I06 (c) V28l0 (d) (-^, -$,*,$) 

The plane containing A, 6, and C. 7. (—1,-1, 5) 11. 

Vector equation: (x, y, z) = (—2, 1, 3) + t\ (1, —2, —2) + / 2 (5, —1, —5); 
parametric equations: x = —2 + q + 5t2, y = 1 — 2? ( — t 2 , z = 3 — 2q — 5t 2 

Vector equation: ( x , y) = (0, —3) + t( 8, —1); 
parametric equations: x = 8t, y = — 3 — t 

Vector equation: ( x , y) = (0, —5) + 1(1, 3); 
parametric equations: x = t,y = —5 + 3 1 

3 (jc + 1) + 6 (y — 5) + 2 (j — 6) = 0 — 18 (x — 9) — 51y — 24 (z — 4) = 0 A plane through the origin 


Exercise Set 4.1 (page 190) 

(a) u + v = (2, 6); ku = (0, 6) (c) Axioms 1-5 Vector space Not a vector space; Axioms 5 and 6 fail. 

Not a vector space; Axiom 8 fails. Vector space Vector space 1 = u~ l 

True/False 4.1 

(a) True (b) False (c) False (d) False (e) True (f) False 


Exercise Set 4.2 (page 200) 

1. (a), (c), (e) 3. (a), (b), (d) 5. (a), (c), (d) 7. (a), (c) 9. (a), (b) 

11 (a) The vectors span R 3 (b) The vectors do not span R 3 The polynomials do not span P 2 

(a) Line; x = — \t, y = — ft, z = t (b) Origin (c) Plane; x — 3v + z = 0 (d) Line; x = — 3f, y = —2 1, z = t 

19. (a) The set spans R 1 (b) The set does not span R 2 

True/False 4.2 

(a) True (b) True (c) False (d) False (e) False (f) True (g) True (h) False (i) False (j) True (k) False 


Exercise Set 4.3 (page 210) 

(a) U 2 = — 5U| (b) A set of 3 vectors in R 2 must be linearly dependent by Theorem 4.3.3. 

(c) p, = 2p, (d) A = (— 1)6 (a) Linearly dependent (b) Linearly independent 

5. (a) Linearly independent (b) Linearly independent 


A24 Answers to Exercises 


(a) The vectors do not lie in a plane (b) The vectors lie in a plane 

(b) V! = |v 2 - f V 3 ; V 2 = |v! + |v 3 ; v 3 = - |v 3 + |v 2 k = - \ , k = 1 

(a) Linearly independent (b) Linearly dependent (a) Linearly independent (b) Linearly dependent 

True/False 4.3 

(a) False (b) True (c) False (d) True (e) True (f) False (g) True (h) False 


Exercise Set 4.4 (page 219) 

(a) (m’ m) (b) («, b ~f) (a) (3, -2, 1) (b) (-2, 0, 1) A = 1 A 3 — 1A 2 + 1A 3 — 1A 4 ; 

(A) s = (1,-1, 1,-1) 

p = 7p 3 — 8p, + 3p 3 ; (a) Linearly independent (b) Linearly dependent 

(P)s = (7, -8, 3) 

(a) (2, 0) (b) (^, - ^) (c) (0, 1) (d) (J, b - (b) (3, 4, 2, 1) 

r —2i — i ch" 

27. (a) (20, 17, 2) (b) 3x 2 + 8x - 1 (c) _ 1Q6 3Q 

True/False 4.4 

(a) False (b) False (c) True (d) True (e) False 


Exercise Set 4.5 (page 228) 

1. Basis: {(1, 0, 1)); dimension: 1 3. No basis; dimension: 0 5. Basis: ((3, 1, 0) , (— 1. 0, 1)}; dimension: 2 

(a) Basis: {(|, 1. 0) , (— |, 0, l)j; dimension: 2 (b) Basis: {(1, 1, 0) , (0. 0, 1)); dimension: 2 

(c) Basis: { (2, —1, 4) } ; dimension: 1 (d) Basis: S = {(1, 1. 0) , (0, 1, 1)); dimension: 2 (a) n (b) "( n + l> (c) uln±21 

11 (b) Dimension: 2 (c) Basis: { — 1 + x, — 1 + x 2 } 13. e 2 and e 3 (the answer is not unique) 

15. Vi, v 2 , and e 3 form a basis for R 3 (the answer is not unique) 17. { v 3 , v 2 ) (the answer is not unique) 

19. (a) 1 (b) 2 (c) 1 

(a) {-1 + x — 2x 2 , 3 + 3x + 6x 2 , 9) (the answer is not unique) 

(b) (1 + x, x 2 } (the answer is not unique) 

(c) {1 + x — 3 A' 2 ) (the answer is not unique) 

True/False 4.5 

(a) True (b) True (c) False (d) True (e) True (f) True (g) True (h) True (i) True (j) False (k) False 


Exercise Set 4.6 


1 (a) 


13 

10 

2 


(b) 


(page 235) 

~ 0 -f 


(c) [w] B = 


: Mi 



L 5 U J 

'32 

L “ ”TJ 

9" 

L 

5 _ 

" 7“ 

2 

(a) 

-2 -3 

5 1 6 

(b) [w]b = 

-9 

-5 

; Mb' = 

23 

2 

6 


- 

\ °" 


2 


f 

(C) 

1 1 

6 3 _ 

(d) Mb = 

-5 

l Mb' = 

-2 



Answers to Exercises A25 


(a) 


(a) 


11 (a) 


3 5" 


2 5" 


2 


"-i" 


3" 


A 

1 -2 

(b) 

_— 1 — 3_ 

(d) Mb, = 

-1 

; Mb 2 = 

l 

(e) Mb 2 = 

-1 

; M«, = 

-1 









L - 1 




1 2 3" 


"—40 16 9" 


"-239" 


5" 


3" 


"-200“ 

2 5 3 

(b) 

13 -5 -3 

(d) Mb = 

77 

; Ms = 

-3 

(e) [w] s = 

-5 

; Mb = 

64 

1 0 8 


5 -2 -1 


30 


1 


0 


25 


cos ( 29 ) sin (29) 
sin (28) — cos (29) 


13. P~'Q-' 


(a) B = {(1,1,0) (1,0, 2) (0,2,1)} (b) B = {(f, i, -§) , (}, 


).(-§. MM 


17. 


2 

5 


3 

-1 


19. B must be the standard basis. 


True/False 4.6 

(a) True (b) True (c) True (d) True (e) False (f) False 


Exercise Set 4.7 (page 246) 

L (a) 1 






"4“ 


0" 


'- 1 “ 

2 


‘3‘ 







-1 

+ 2 

4 

(b) -2 

3 

+ 3 

6 

+ 5 

2 

L - 




_ 0 _ 


_-l_ 


4_ 


3 (a) b is not in the column space of A (b) b is in the column space of A; 


1 


-1 


1 


5 

9 

- 3 

3 

+ 

1 

= 

1 

_ 1 _ 


1 


_1 


_-l_ 



'5' 


'- 2 ' 


r°i 


r 


'5' 


'- 2 ' 


' 0 ‘ 


0 


1 


0 


0 


0 


1 


0 

(a) /• 

0 

+ ^ 

1 

+ t 

1 

(b) 

-1 

+ r 

0 

+ ^ 

1 

+ t 

1 


0 


0 


1 


5 


0 


0 


1 


(a) (1.0) + f (3, 1); f (3, 1) (b) (-2, 7, 0) + f(-l, -1, 1); f(— 1, — 1, 1) 

"16“ 


9 (a) Basis for the null space: 


(b) Basis for the null space: 


1 (a) Basis for the column space: 


; basis for the row space: {[1 0 -16], [0 1 -19]} 

; basis for the row space: { [ 1 0 — ) ] } 

; basis for the row space: {[l 0 2] . [0 0 l]j 



T 


"2" 



0 

, 

1 



_0_ 


_0_ 



T 



3 - 


0 


1 


0 

’ 

0 


.0, 


0_ 


(b) Basis for the column space: 


; basis for the row space: {[l —3 0 0] . [0 1 0 0]} 


A26 Answers to Exercises 


(a) Basis for the row space: {[l 0 11 0 3] , [0 1 3 0 0] , [0 0 0 1 0]j; 


basis for the column space: 



r n 


r- 2 i 


r°i 



-2 


5 


0 



-1 

’ 

3 

’ 

1 



-3 


8 


1 


- 

2 5 


-7 

0 

-6] 


(b) {[1 -2 5 0 

15. {(1, 1, 0, 0) , (0, 0, 1. 1) , (-2, 0, 2, 2) , (0, -3, 0, 3)} 
19. {[1 4 5 6 9], [3 -2 1 4 -l]} 


-13-2 1 -3]} 

1 Basis: {v t , v 2 , v 4 ); v 3 = 2vi — v 2 ; v 5 = — vi + 3v 2 + 2v 4 


Since Ta (x) = Ax, we are seeking the general solution of the linear system Ax = b. 

(a) x = f (— §, f , l) 

(b) x=G,— §,0) + *(— 1,1,1) 

(c) x = (3 , — | , 0) + f (— f , | , l) 


(b) 


0 

0 

0 


0 

1 

0 


o' 

0 

1 


is an example of such a matrix 


(a) 


3 a 
3b 


— 5a 
-5b 


where a and b are not both zero 


(b) Only the zero vector forms the null space for both A and B. 
The line 3x + y = 0 forms the null space for C. 

The entire plane forms the null space for D. 


True/False 4.7 

(a) True (b) False (c) False (d) False (e) False (f) True (g) True (h) False (i) True (j) False 


Exercise Set 4.8 (page 256) 

(a) rank(A) = 1; nullity(A) = 3 (b) rank(A) = 2; nullity(A) = 3 

(a) rank(A) = 3; nullity(A) = 0 (c) 3 leading variables; 0 parameters in the general solution (the solution is unique) 

(a) rank(A) = 1; nullity (A) = 2 (c) 1 leading variable; 2 parameters in the general solution 

(a) largest possible value for the rank: 4; smallest possible value for the nullity: 0 

(b) largest possible value for the rank: 3; smallest possible value for the nullity: 2 

(c) largest possible value for the rank: 3; smallest possible value for the nullity: 0 



(a) 

(b) 

(c) 

(d) 

(e) 

(0 

(g) 

dimension of the row space of A 

3 

2 

1 

2 

2 

0 

2 

dimension of the column space of A 

3 

2 

1 

2 

2 

0 

2 

dimension of the null space of A 

0 

1 

2 

7 

7 

4 

0 

dimension of the null space of A T 

0 

1 

2 

3 

3 

4 

4 

is the system Ax = b consistent? 

Yes 

No 

Yes 

Yes 

No 

Yes 

Yes 

number of parameters in the 
general solution of Ax = b 

0 

— 

2 

7 

— 

4 

0 


(a) nulhty(A) — nullity (A r ) = 1 (b) nullity(A) — nullity(A r ) = n — m (a) 3 (b) 2 

The matrix cannot have rank 1 . It has rank 2 if r = 2 and s = 1 . 


17 No, both row and column spaces of A must be planes. 
19. (a) 3 (b) 5 (c) 3 (d) 3 21. (a) 3 (b) No 


Answers to Exercises A27 


27 (a) Overdetermined; inconsistent if 3/?i + + 2b^ ^ 0 

(b) Underdetermined; infinitely many solutions for all V s; (cannot be inconsistent) 

(c) Underdetermined; infinitely many solutions for all Z?’s; (cannot be inconsistent) 


True/False 4.8 

(a) False (b) True (c) False (d) False (e) True (f) False (g) False (h) False (i) True (j) False 


Exercise Set 4.9 (page 268) 

I (a) 


1 o" 

"-l" 

_ 

"-l" 

(b) 

’-1 o" 

'-f 

_ 

V 

(c) 

"o r 

"-f 

_ 

2 

0 -1 

2 


-2 

0 1 

2 


2 

1 0 

2 


-1 



'1 

0 

o' 


2" 


2" 


(a) 

0 

1 

0 


-5 

= 

-5 

(b) 


0 

0 

-1 


3 


-3 



T 0 O' 
0-1 0 
0 0 1 



2" 


"2" 



-5 

= 

5 

(c) 


3_ 


_3_ 



-1 0 

0 1 

0 0 


2 

-5 

3 


- 2 ' 

-5 

3 



"l o' 

2 


Y 


"0 o' 

2 


o' 

(a) 



= 


(b) 



= 



0 0 

-5 


0 


0 1 

-5 


-5 



"1 

0 

0" 


"-2" 


"-2" 


'1 

0 

o' 


"-2" 


"-2" 


"0 

0 

0" 


"-2" 


"O' 

(a) 

0 

1 

0 


1 

= 

1 

(b) 

0 

0 

0 


1 

= 

0 

(c) 

0 

1 

0 


1 

= 

1 


0 

0 

0_ 


3 


0 


0 

0 

1 


3 


3 


_0 

0 

1 


3 


3 


9 (a) 


(c) 


II. (a) 


(c) 


13. (a) 


'VI 

2 

\ 

. 2 
2 

VI 

L 2 
1 
0 


_ 1 
2 

VI 

2 

2 

2 


0 0 


3 

-4 


3V3 


+ 2 


LI-2V3 


4.60" 


1 4V 

3 ' 


' §-2vT 


’—1.96" 

-1.96 

(b) 

41 1 

L 2 2 J 

-4 

— 

3%/3 rx 

L 2 Z J 


-4.60 


3’ 


i_4T 


4 . 95 ' 


"O 

-l" 

3’ 


-4 

— 

2 

42 

L 2 J 


-0.71 

(d) 

1 

0 

-4 

— 


VI 

2 


o -i 


Y o 
o 1 

& o 


1 

2 

2 

_v?' 

2 

0 

vi 
2 . 



2" 


' 2 ' 



-1 

= 

1-# 

(b) 


2_ 


_| + V3. 



VI 

2 

0 


Y o 


1 o 


2" 


o' 


1 

= 

-1 

(d) 

2 


2V2 



-I 0 
'0 -1 
1 0 
0 0 


VI 

2 

o' 

0 

1 


2" 


' V3+ 1 ' 

-1 

= 

-1 

2_ 


_-l + V3_ 



2" 


T 


-1 

= 

2 


2_ 


_2_ 


o' 

'-l' 


1 " 


'3 o' 

'-f 


'-3' 



— 

2 

(b) 



— 


1 

2 _ 

2 


1 

0 3 

2 


6 



" 1 

4 

0 

0“ 


2" 


- 1 - 

2 


"2 

0 

o' 


2" 


4' 

(a) 

0 

1 

4 

0 


-1 

= 

1 

4 

(b) 

0 

2 

0 


-1 

= 

-2 


_0 

0 

1 

4- 


3_ 


3 

- 4- 


_0 

0 

2_ 


3_ 


6_ 


17. (a) 


19. (a) 


O" 

"-T 


1 " 


'1 o' 

'-l' 


’-l" 


— 

2 

(b) 



— 


1 

2 


2 

_° 5. 

2 


1 


1/a O' 

a 


ala 

(b) 

'1 o' 

a 


a 

0 1 

b 


b 

0 a 

b 


cub 


A28 Answers to Exercises 


21 (a) the matrix A\ 

corresponds to the 
contraction with 
factor i 


(b) the matrix A 2 
corresponds to the 
compression in the 
^-direction with 
factor | 


(c) the matrix A 3 
corresponds to the 
shear in the 
y-direction by a 
factor ( 


(d) the matrix A 4 
corresponds to the 
shear in the 
y-direction by a 
factor — | 




(a) 

(b) 


2 0 
0 2 
'l 2 
0 1 


25. The standard matrix: 


27. The standard matrix: 


(dilation with factor 2) 

(shear in the x-direction by a factor 2) 

;fys(3,4) = (!+V3,^ + 3)« 

1^/3 ( 3 , 4 ) = (-|+ 2 V 3 , ¥+2 


1 V3 

4 4 

-s/3 3 

L 4 4 J 

_i & 

2 2 

a/3 1 

L 2 2 J 


( 2 . 48 , 4 . 30 ) 

) ~ ( 1 . 96 , 4 . 60 ) 


29. Reflection about the xy-plane: T (1, 2, 3) = 


Reflection about the xz-plane: T (1, 2, 3) = 


Reflection about the yz-plane: T (1, 2, 3) = 


'10 0 


T 


1" 

0 1 0 


2 

= 

2 

1 

7 

o 

o 

1 


_3_ 


3_ 

"1 0 o' 


"1" 


" 1" 

0-1 0 


2 

= 

-2 

1 

o 

o 

1 


3 


3 


T 


"-1" 

2 

= 

2 

_3_ 


3_ 



"V 3 

2 

1 

2 

0 


'1 

0 

O' 


0 

0 

1 " 

- 

(a) 

1 

2 

a/3 

2 

0 

(b) 

0 

1 

2 

i 

V 2 

(C) 

0 

1 

0 

33 . 


0 

0 

i 


0 

1 

a/2 

1 

v'jJ 


-1 

0 

0 _ 

_ 


37. Rotation through the angle 26 


8 

9 

1 

9 

4 

9 


39. Rotation through the angle 6, then translation by x 0 ; not a matrix transformation 


4“ 

9 

4 

9 

7 

9- 


Exercise Set 4.10 (page 277) 


1 (a) Operators do not commute 
5. [7i o r A ] = 


(b) Operators do not commute 3 Operators commute 


'-10 -7" 


"-8 -3" 


"l o' 


"o o' 



; [t a o t b ] = 


(a) 


(b) 


(C) 

5 -10 


13 -12 


0 -1 


_° L 



3 

2 

V3 


'-1 

0 

0 " 


" 1 

0 

1 " 


"-1 

0 

0 

0 

0 

0 

(b) 

0 

V2 

0 

(c) 

0 

1 

0 

0 

0 

1 


-1 

0 

1 


0 

0 

0 


3 a/3 
2 

_ 3 
2 


(a) 


Answers to Exercises A29 


11 (a) 


(b) 


mi = 


i 

i 


[T 2 o T{\ = 


1 

-1 

"3 

6 


; [t 2 ] = 


o 

4 


3 

-2 


; [Tx O T 2 ] = 


5 

1 


4 

-4 


(c) 7) (J 2 (xi,x 2 )) = (5X! + 4x 2 , X\ - 4x 2 ); T 2 (7\ (x u x 2 )) - (3xi + 3x 2 , 6xi - 2x 2 ) 

13. (a) Not one-to-one (b) One-to-one (c) One-to-one (d) One-to-one 

15- (a) Reflection about the .r-axis (b) Rotation through an angle of — 7r/4 (c) Contraction by a factor of \ 


17. (a) 

(b) 


w l 


1 

'xl- 

oo 

Xi 

W 2 


2 1 

x 2 

~U)i~ 


'-1 

3 2 

U>2 

= 

2 

0 4 

Wt, 


1 

3 6 


; the operator is not one-to-one 

; the operator is not one-to-one 


19. (a) One-to-one; standard matrix of T 1 : 
21 (a) One-to-one (b) Not one-to-one 


i 

3 

1 

3 




1 


"-1" 


(a) 


5 

, 

6 




7 


4 



(b) {(-14,19,11)} 



(Wi, W 2 ) = (jWi - 


\w 2 , |wi + \w 2 ) 


(b) Not one-to-one 


(c) rankCT) = 2; nullity(T) = 1 (d) rank(A) = 2; nullity(A) = 1 


25. Basis for ker(T)i): {(10, 2, 0, 7)}; basis for R(T A ): 



(a) Range of T must be a proper subset of R" (b) T maps infinitely many vectors into 0 
29. (a) Yes (b) Yes 


True/False 4.10 


(a) False (b) True 

(c) True (d) False 

Exercise Set 4.11 (page 287) 

l. / = 

3. / 

= 1 X ' 

5. 

y 


(0,1) 


(1,1) 

4 

= 

2 


= 

X 

(0, 0) 

l 

(1,0) 


- 



(e) True (f) True (g) True 




(a) 


'I o' 


"l o' 


0 -l" 


(b) 


(c) 

-1 0 

0 5 

2 5 


9. (a) Operators commute (b) Operators do not commute 


11. Shearing by a factor of 1 in the x-direction, then reflection about the x-axis, then expanding by a factor of 2 in the v-direction, 
then expanding by a factor of 4 in the x -direction. 


A30 Answers to Exercises 


13 Reflection about the x-axis, then expanding by a factor of 2 in the y-direction, then expanding by a factor of 4 in the x-direction, 
then reflection about the line y = x. 

15. (a) The unit square is expanded in the x-direction by a factor of 3. 

(b) The unit square is reflected about the x-axis and expanded in the y-direction by a factor of 5. 

(b) No, Theorem 4.11.1 applies only to invertible matrices. 




(b) Shearing by a factor of — 1 in the x-direction, then expanding by a factor of 2 in the y-direction, then shearing by a factor of 1 
in the y-direction. 


23. 


,y 

( 0 , 1 ) ( 1 , 1 ) 

T 


1 


x 



25. The line segment from (0,0) to (2,0). Theorem 4.11.1 does not apply here because A is singular. 


True/False 4.11 

(a) False (b) True (c) True (d) True (e) False (f) False (g) True 


Chapter 4 Supplementary Exercises (page 289) 

(a) u + v = (4, 3, 2); ku = (—3, 0. 0) (c) Axioms 1-5 a plane if s= 1; a line if .v= —2; the origin if —2 and 5 ^ 1 


7 A must be invertible 


9 (a) rank is 2; nullity is 1 

(b) rank is 2; nullity is 2 

(c) For n — 1, rank is 1 and nullity is 0; for n > 2, rank is 2 and nullity is n — 2. 


11 (a) { 

l,x 2 

x 4 , 


, x 2Ln/2J 

} 

(b) | 

1, X 

- 

x", x 2 

-x" 


, X 

n- 1 

- 

x"\ 




'1 

0 

o' 


"0 

1 

O' 


"0 

0 

1" 


"0 

0 

o' 


"0 

0 

0" 

- 

(a) 


0 

0 

0 

, 

1 

0 

0 

, 

0 

0 

0 

, 

0 

1 

0 

, 

0 

0 

1 

, 



_0 

0 

0 


0 

0 

0 


1 

0 

0 


0 

0 

0 


0 

1 

0 



(b) 


0 

-1 

0 



0 

0 

1 " 


, 

0 

0 

0 

, 


-1 

0 

0 _ 



0 

0 

-1 


15. Possible ranks are 0, 1, and 2. 17. (a) Yes 


O' 

1 
0 

(b) No 


(c) Yes 


0 0 
0 0 
0 0 


O' 

0 

1 


Answers to Exercises A31 


Exercise Set 5.1 (page 302) 

1. eigenvalue: —1 3. eigenvalue: 5 

(a) Characteristic equation: (A— 5) (A+l) = 0; 
eigenvalue: 5, basis for eigenspace: {(1, 1)); 
eigenvalue: —1, basis for eigenspace: {(—2, 1)) 

(b) Characteristic equation: A 2 + 3 = 0; no real eigenvalues 

(c) Characteristic equation: (A — l) 2 = 0; 
eigenvalue: 1, basis for eigenspace: {(1,0), (0, 1)) 

(d) Characteristic equation: (A — l) 2 = 0; 
eigenvalue: A = 1, basis for eigenspace: {(1,0)) 

Characteristic equation: (A— 1) (A— 2) (A— 3) = 0; 
eigenvalue: 1, basis for eigenspace: {(0,1,0)); 
eigenvalue: 2, basis for eigenspace: {(—1,2,2)}; 
eigenvalue: 3, basis for eigenspace: {(—1, 1, 1)) 

Characteristic equation: (A+2) 2 (A— 5) = 0; 
eigenvalue: —2, basis for eigenspace: {(1, 0, 1)) ; 
eigenvalue: 5, basis for eigenspace: {(8,0, 1)} 

Characteristic equation: (A— 3) 3 = 0; 

eigenvalue: 3, basis for eigenspace: {(0, 1,0), (1, 0, 1)) 

13. (A — 3) (A — 7) (A — 1) = 0 

15. Eigenvalue: 5, basis for eigenspace: {(1, 1)}; 
eigenvalue: —1, basis for eigenspace: {(—2, 1)) 

(b) A— —oi is the eigenvalue associated with given eigenvectors. 

(a) Eigenvalue: 1, eigenspace: span{(l, 1)); 
eigenvalue: —1, eigenspace: span{(— 1, 1)) 

(b) Eigenvalue: 1, eigenspace: span{(l,0)); 
eigenvalue: 0, eigenspace: span{(0, 1)) 

(c) No real eigenvalues 

(d) Eigenvalue: k, eigenspace: R 2 

(e) Eigenvalue: 1, eigenspace: span{(l,0)} 

(a) Eigenvalue: 1, eigenspace: span{(l, 0, 0) , (0, 1, 0)}; 
eigenvalue: —1, eigenspace: span{(0, 0, 1)) 

(b) Eigenvalue: 1, eigenspace: span{(l, 0, 0) , (0, 0, 1)); 
eigenvalue: 0, eigenspace: span{(0, 1,0)} 

(c) Eigenvalue: 1, eigenspace: span{(l, 0, 0)} 

(d) Eigenvalue: k, eigenspace: R 3 

23. (a) y = lx and y = x (b) No invariant lines (a) 6 x 6 (b) Yes 


0 0 1 _ 

True/False 5.1 

(a) False (b) False (c) True (d) False (e) False (1) False 


Exercise Set 5.2 (page 313) 





'1 0 -2 

"l o" 

3 1 

(answer is not unique) 

P = 

0 1 0 



_0 0 1 _ 


(c) Three 


P = 


(answer is not unique) 


A32 Answers to Exercises 


(a) 3 and 5 (b) rank (31 — A) = 1; rank (51 — A) = 2 (c) Yes 


1] eigenvalues: 1, 2 and 3; each has algebraic multiplicity 1 and geometric multiplicity 1; 



'i 2 r 


'1 0 O' 

A is diagonalizable; P = 

1 3 3 

1 3 4_ 

(answer is not unique); P 1 AP = 

0 2 0 

0 0 3 


13, eigenvalue A = 0 has both algebraic and geometric multiplicity 2; 
eigenvalue A = 1 has both algebraic and geometric multiplicity 1 ; 



'0 

-1 

O' 


"0 

0 

O' 

A is diagonalizable; P = 

1 

0 

0 

(answer is not unique); P 1 AP = 

0 

0 

0 


_0 

3 

1 


0 

0 

1 


15. (a) A is a 3 x 3 matrix; 

all three eigenspaces (for A = 1, A = —3, and A = 5) must have dimension 1. 

(b) A is a 6 x 6 matrix; 

the possible dimensions of the eigenspace corresponding to A = 0 are 1 or 2; 

the dimension of the eigenspace corresponding to A = 1 must be 1; 

the possible dimensions of the eigenspace corresponding to A = 2 are 1, 2, or 3. 


24,234 -34,815 

-23,210 35,839 



"-1 

10,237 

-2,047 

A 11 = 

0 

1 

0 


0 

10,245 

-2,048 



'1 

-1 

1 " 


21 . 

2 

0 

-1 



1 

1 

1 



25. Yes 



(a) The dimension of the eigenspace corresponding to A = 1 must be 1 ; the possible dimensions of the eigenspace corresponding 
to A = 3 are 1 or 2; the possible dimensions of the eigenspace corresponding to A = 4 are 1, 2, or 3. 

(b) The dimension of the eigenspace corresponding to A = 1 must be 1 ; the dimension of the eigenspace corresponding to A = 3 
must be 2; the dimension of the eigenspace corresponding to A = 4 must be 3. 

(c) This eigenvalue must be A = 4. 


31. Standard matrix: 


0 -1 

-1 0 


; diagonalizable; P = 


-1 1 

1 1 


(answer is not unique) 






L 





'3 

0 

0" 


"0 

0 

3' 

33. Standard matrix: 

0 

1 

0 

; diagonalizable; P = 

0 

-1 

0 


1 

-1 

0 


1 

1 

1 


(answer is not unique) 


True/False 5.2 

(a) False (b) True (c) True (d) False (e) True (f) True (g) True (h) True (i) True 


Exercise Set 5.3 (page 326) 

u = (2 + i, -4 i, 1 - i); Re (u) = (2, 0, 1); Im(u)= (-1,4, 1); ||u|| = */23 x = (7-6 1, -4-81,6- 121) 


A = 


5 i 4 
2 + i 1-5 i 


; Re (A) = 


0 4 
2 1 


; Im (A) = 


-5 0 
-1 5 


; det (A) = 17—1; tr (A) = 1 


u • v = — 1 + 1; u • w = 18 — 71; v • w = 12 + 61 —11 — 141 

2 + 1 


15. Eigenvalue: 2 + 1, basis for eigenspace: 

17. Eigenvalue: 4 + 1, basis for eigenspace: 


1 

1 + 1 
1 


; eigenvalue: 2 — 1, basis for eigenspace: 

; eigenvalue: 4 — 1, basis for eigenspace: 


2-1 

1 

1 — 1 
1 


Answers to Exercises A33 


19. \X\ = V2; 0 = f 21. |A|= 2; </. = -| 
(a) A'= — 1( (b) None 


P = 


-2 

2 



-2 

3 


P = 




True/False 5.3 

(a) False (b) True (c) False (d) True (e) False (1) False 


Exercise Set 5.4 (page 332) 

(a) y\ = c x e 5x — 2 c 2 e~ x , y 2 = c,e 5x + c 2 e~ x 

(b) V! = 0, y 2 = 0 

(a) y\ = —c 2 e lx — c 3 e 3x , y 2 = c x e x + lc 2 e lx + c 3 e 3x , y 3 = lc 2 e lx + c 3 e 3x 

(b) vi = e 2x - 2e 3x , y 2 = e x - 2e lx + 2e 3x , y 3 = -2e lx + 2e 3x 


7. y = c x e 3x - c 2 e~ lx 

9- 3 

= Cie* + c 2 e lx + c 3 e 3x 


>i" 


O 

Q | 

(b) y' = Ay where y = 

yi 

and A = 

0 0 1 


JK 


-2 1 2_ 


(c) The solution of the system: y t = c\e lx + c 2 e x + c 3 e r , y 2 = 2c\e lx + c 2 e x — c 3 e x , and y 3 = 4c 1 e 2x + c 2 e x + c 3 e x ; 
The solution of the differential equation: y = cie 2 * + c 2 e x + c 3 e _JC 

True/False 5.4 

(a) True (b) False (c) True (d) True (e) False 


Exercise Set 5.5 (page 342) 


(a) Stochastic (b) Not stochastic (c) Stochastic (d) Not stochastic 



’0.54545’ 


"_8_“ 


~ 4 ~ 
11 

x 4 = 

0.45455 

5. (a) Regular (b) Not regular (c) Regular 

17 

9 

_ 17 _ 

9 . 

4 

11 

3 

- 11 _ 


(a) Probability that the system will stay in state 1 when it is in state 1 

(b) Probability that the system will move to state 1 when it is in state 2 

(c) 0.8 

(d) 0.85 


13. (a) 


0.95 

0.05 


0.55 

0.45 


(b) 0.93 (c) 0.142 (d) 0.63 



initial 

state 

after 

1 year 

after 

2 years 

after 

3 years 

after 

4 years 

after 

5 years 

city population 
suburb population 

100,000 

25,000 

95,750 

29,250 

91,840 

33,160 

88,243 

36,757 

84,933 

40,067 

81,889 

43,111 


(b) City population will approach 46,875 and the suburbs population will approach 78,125. 



~ 7 

10 

1 1 " 

10 5 


" 1“ 

3 


P = 

1 

5 

1 

- 10 

3 1 

10 2 

3 3 

5 10- 

; steady-state vector: 

1 

3 

1 

-3- 

For any positive integer k, P k q = q 


True/False 5.5 

(a) True (b) True (c) True (d) False (e) True (f) False (g) True 


A34 Answers to Exercises 


Chapter 5 Supplementary Exercises (page 345) 


1. (b) A is the standard matrix of the rotation in the plane about the origin through a positive angle 9. Unless the angle is an integer 
multiple of n, no vector resulting from such a rotation is a scalar multiple of the original nonzero vector. 


'1 

1 

o' 


15 

30" 


"75 

150" 

, A 4 = 

"375 

750" 


1875 

3750" 

0 

2 

1 

9. A 2 = 

5 

10 

, A 3 = 

25 

50 

125 

250 

, A 5 = 

625 

1250 


_0 

0 

3_ 


L 












11 0, tr(A) 13. All eigenvalues must be 0 


15. 



17 The only possible eigenvalues are —1,0, and 1. 


0 O' 

1 _i 

2 2 

1 _ 1 

2 2 - 


15 The remaining eigenvalues are 2 and 3. 


Exercise Set 6.1 (page 353) 

1. (a) 12 (b) -18 (c) -9 (d) ^30 (e) VTi (f) Vm 

3. (a) 34 (b) -39 (c) -18 (d) s/¥) (e) a/34 (f) -/(AO 

i 

5. 


'a/2 0 ' 

0 V3 


7 -24 


9. 3 11 -29 


13. 


'a/3 0 ' 

0 -v/5 


|jp = a/T4 , d(p. q) = yi37 ||C/|| = V93, d(U, V) = V 99 

||u|| = V65, d(u, v) = 12V^ (a) -101 (b) 3 


-50 ||u|| = V30, d(u, v) = s/107 

3vTT llpll = 6v^,4(p,q) = llV2 


29. 


-2 

(u. v) = ^uiVi + u 2 V 2 Axioms 2 and 3 do not hold. 14(u, v) — 4 ||u|| 2 — 6 ||v|| : 

(a) | (b) (c) -\/2 (d) 0 (b) k\ and k 2 must both be positive. 



True/False 6.1 

(a) True (b) False (c) True (d) True (e) False (f) True (g) False 


Exercise Set 6.2 (page 361) 

(a) —4= (b) 0 (c) — -4= 0 ^44= (a) Orthogonal (b) Not orthogonal (c) Orthogonal 

Orthogonal if k = | The weights must be positive numbers such that = 4w 2 . No No 


Answers to Exercises A35 


{(-1,-1, 1,0), (=,-f,0, 1)} (a)y = -ijc (b)x = t,y = -2t,z = -3t 

31. (a) { (b) ||p|| = i 33. (a) 0 (b) ||p|| = 51 (a) v = a (1,-1) (b) v = a(l, -2) 

l|t * 11 = *7s ||q|| = 

True/False 6.2 

(a) False (b) True (c) True (d) True (e) False (f) False 


Exercise Set 6.3 (page 376) 

(a) Orthogonal but not orthonormal (b) Orthogonal and orthonormal 

(c) Not orthogonal and not orthonormal (d) Orthogonal but not orthonormal 

3. (a) Orthogonal (b) Not orthogonal 

An orthonormal basis: { ( 3 . 0, 0, (0. 1, 0)| 

7. u = — yV, — jV 2 + 2 V 3 9. u = 0 Vl - |V 2 + \y 3 11. (-“, -|, 2) 13. (0,-§,{) 

(a)(i>D (b)(-|.§) (»)(!.§) (b)H.|) ( a > (t’ I’ f) 0») (f, -!•-!) 


qi_ (sAo’ vho)’ q2 _ (vTo’ Ao) 


23 .( 1 . §, - 1 ,- 1 ) 25 . 

|(ir jl) ’ ^’°) ’ (^’ 



31 ' K 0, 71’ V5’°) ’ (^’ V5o’ V5o’ °) ’ (vho’ Jw' 3w' Jto) ’ (ji5’ 73’ Jb’VTs)} 

From Exercise 23, wi = proj^b = (|, — 1, — l), so w 2 = b — proj^b = (— 1, A, 1, —1). 

w ‘ = (H, f) ’"7 = I) An orthonormal basis: j(i, , (jg, . (^>-^- 0 )) 

For example, x = (V, 0^ and y = ( 0 , (b) proj^u = (2, 1, 2) (using both methods) 

43. An orthonormal basis: (1, x/3 (— 1 + 2x) , \/5 (l — 6x + 6x 2 )} 


R = 

Vs 

0 

vr 

x/5_ 

(Q is given) 

R = 

-V2 

0 

V2 

x/3 

Vr 
_ 1 

V3 






0 

0 

-s/ 6 - 


49. A does not have a QR -decomposition. 


( Q is given) 

(b) The range of T is VF; the kernel of T is VP 2 -. 


True/False 6.3 

(a) False (b) False (c) True (d) True (e) False (f) True 


A36 Answers to Exercises 


Exercise Set 6.4 (page 386) 


"21 

25" 

Xi 


'20' 

25 

35 

x 2 _ 


20 


5. X I = 12, X 2 = —3, X} 


1. Least squares error vector: 


2l”i 
" n 

27 

" 11 
15 

11 J 


; least squares error: -j)-\/l 10 ~ 2.86 


r 3“| 


9. Least squares error vector: 


-3 

0 


; least squares error: 3\/3 ^ 5.196 


3. 


11. Least squares solutions: x t 


2 


\t, x 2 = t; error vector: 


0 

2 


13. Least squares solutions: x\ 


— \—t,x 2 = I — t , X 3 = f; error vector: 



~ 92 ~ 


3" 





'1 

0 

o' 



285 




"l 

o’ 






1 

439 

17. 

-4 

19. 



21 . 

0 

0 

0 

23. 

7 

285 




0 

0 






18 

94 

57 - 


_-l_ 





_0 

0 

1 _ 


_ 7 _ 



" 10 

15 

-5' 


' 0" 






-1 

(a) {(1.0, -5), (0, 1,3)) (b) ± 

15 

26 

3 

27. 

1 


5 

3 

34_ 







1 


True/False 6.4 

(a) True (b) False (c) True (d) True (e) False (f) True (g) False 


Exercise Set 6.5 (page 393) 




S V = — + — 

' ' - v 21 T 7x 


True/False 6.5 

(a) False (b) True (c) True 


(aa t ) ‘a 


(h) True 


(d) False 


Answers to Exercises A37 


Exercise Set 6.6 (page 399) 

2 2 2 

(a) 1 + jt — 2 sinx — sin2x (b) 1 + n sinx sin (2x) — • • • sin(ux) 

12 n 


(a) — 

e - 

3x 

(a) — 


1 

~~ 2 

(b) 1 


(b) 


le — 19 
12e - 12 



0.392 


0.00136 



(— l) 4 ) sin kx 


True/False 6.6 

(a) False (b) True (c) True (d) False (e) True 


Chapter 6 Supplementary Exercises (page 399) 

(a) (0, a, a, 0) with a ^0 (b) ±(o,^, ^,o) 

(a) The subspace of all matrices in M 2 2 with zeros on the main diagonal. 

(b) The subspace of all 2 x 2 skew-symmetric matrices. 

^(tI’O’Ti) ^ No (b) # approaches | 17. No 


Exercise Set 7.1 (page 407) 


(a) Orthogonal; A 1 = 


1 0 
0 -1 


(b) Orthogonal; A 1 = 


■Ji 
' -ft 


i -i 

s/2 
_ 1 _ 
V2 J 


(a) Not orthogonal (b) Orthogonal; A 1 = 


I- 1 

V2 

0 

1 -I 

•ft 


" 23" 

5 

1 

V6 

2 

V6 

•n/6 

7. T a (x) = 

18 

25 

1 

1 

1 


101 

L V3 

V3 

V3-J 


- 25 - 


; \\T a (x)|| = ||x|| = V38 


9 Yes 


a 1 + b 2 = i 13. (a) 


"-1 + 3VT 


r 1 -V 31 


r 1 “ 
V2 


“ 5-| 

ft 

_ 3 + -s/3 _ 

(b) 

1 + |\/3 

(a) 

3 

y/2 

_ 5 _ 

(b) 

7 

ft 
— 3_ 



1 5ft~ 

2 2 


1 3ft ~ 

2 2 


'1 

0 

0 

(a) 

1 

1 

+ 

1 

(b) 

1 

1 

1 

l' ° ^ 

19. 

1 

O O 

cos # 

— sin# 

sin# 

cos# 


2 1 (a) Rotations about the origin, reflections about any line through the origin, and any combination of these 

(b) Rotations about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these 

(c) No; dilations and contractions 


(a) (p) 5 = (^, V2, ^), (q), = 2V2, - fj 

(b) llpll = vTT, d( p. q) = V 2 T, (p, q) = 0 


True/False 7.1 

(a) False (b) False (c) False (d) False (e) True (f) True (g) True (h) True 


A38 Answers to Exercises 


Exercise Set 7.2 (page 416) 

X 2 — 5X = 0; X = 0: one-dimensional; X = 5: one-dimensional 
X 3 — 3X 2 = 0; X = 3: one-dimensional; X = 0: two-dimensional 
X 4 — 87. 3 = 0; X = 0: three-dimensional; A = 8: one-dimensional 



2 

7T 


'3 

o" 


" 4 

5 

0 

3“ 

5 


'25 

0 

O' 

p = 

V7 

77 

; P-'AP = 

P = 

0 

l 

0 

; P-'AP = 

0 

-3 

0 


VI 

2 


0 

10 










L v? 

V7_ 





3 

- 5 

0 

4 

5- 


_ 0 

0 

— 50_ 



1 

1 

tsj- 

1 

V6 

1 -i 

■s/3 


"3 

0 

O' 

P = 

i 

V2 

_ J_ 

•s/6 

1 

V3 

; = 

0 

3 

0 


0 

2 

V6 

1 

VsJ 


_0 

0 

0_ 



r 4 

5 

0 

3 

5 

0 " 


'-25 

0 

0 

0" 


3 

5 

0 

4 

5 

0 


0 

-25 

0 

0 

Ul 

73 

II 

0 

4 

5 

0 

3 

5 

; P-'AP = 

0 

0 

25 

0 


0 

3 

5 

0 

4 

5- 


0 

0 

0 

25_ 


15. (2) 

r 1 1 

V2 

1 

[ V2 V2] + ( 4) 

r 1 1 

VI 

1 

cV 

II 

" 1 1 - 

2 2 

1 1 

+ (4) 

" 1 1 - 

2 2 

1 1 


- y/2- 


-V2- 


- 2 2- 


-2 2- 



r 1 " 

VI 


r 1 "i 

V3 


r 1 n 

V6 

17. (-4) 

1 

VI 

0_ 

[ V V2 °] + (~ 4 ) 

1 

V3 

1 

[ V3 V3 V3] + (2) 

1 

V6 

2 



- VlJ 


L V6-I 


r 1 -i ol 

2 2 u 


" 1 1 1 - 

3 3 3 


- 1 1 1 “ 

6 6 3 

_i 1 0 

2 2 u 

+ (—4) 

1 1 1 

3 3 3 

+ (2) 

1 1 1 

6 6 3 

1 — 

0 

0 

0 

1 


1 1 1 

- 3 3 3- 


1 1 2 

-3 3 3- 


19. 

3 0 0 

0 3 4 

21. Yes 23. (a) 

" VI-1 “ 
4 - 2 V 2 


~-V2-l~ 

4 + 2 V 2 

(b) 

r— In 

VI 

1 


r 1 1 

VI 

1 


1 

O 

4^ 

U> 

1 


_4-2V2_ 


_ 4 + 2 V 2 _ 


L V2 J 


Lyd 


True/False 7.2 

(a) True (b) True (c) False (d) True (e) True (f) True (g) True 


Exercise Set 7.3 (page 427) 


1 (a) [*! x 2 ] 


"3 

0 " 

Xl 

0 

7 

x 2 


(b) [x l x 2 \ 


4 

-3" 

Xl 

-3 

-9 

. X 2. 


(c) [xi x 2 x 3 ] 


' 9 

3 

-4 


3. lx 2 + 5y 2 — 6xy 

Xl 


r 1 

VI 

1 

1 -1 

VI 

1 

yi 


X 2 


- 7! 

yl- 

yi 


; Q = 3y, 2 + y\ 


7. 


l~\ 

" 3 

2 
’ 3 

2 

3 J 


; e = yf + 4y? + 7y\ 


4" 


JCl 

1 

2 


X2 

4_ 


- X 3_ 


Answers to Exercises A39 


(a) [x y] 


' 2 i 

X 

+ [1 - 6 ] 


A 0 - 

_y_ 




+ (2) = 0 (b) [x >'] 


0 0 
0 1 


+ [7 -8] 


+ (-5) = 0 


(a) Ellipse (b) Hyperbola (c) Parabola (d) Circle Hyperbola: 3y' 2 — 2x' 2 = 8; 9 = sin 1 

Hyperbola: Ax' 1 — y' 2 = 3; 8 = sin -1 (|) « 36.9° 


63.4° 


17, (a) Positive definite (b) Negative definite (c) Indefinite 
15 Positive definite 21. Positive semidefinite 23. Indefinite 

l l l ~ 


n n(n— 1) n(n— 1) 

1 1 1 


n(n— 1) n 


n(n- 1) 


(d) Positive semidefinite (e) Negative semidefinite 
27. (a) Indefinite (b) Negative definite 29, k > 2 


(a) s 2 = x T 


x 35 A must have a positive eigenvalue of multiplicity 2. 


l 


L n(u—l) 


n(n— 1) 


True/False 7.3 

(a) True (b) False (c) True (d) True (e) False (f) True (g) True (h) True (i) True (j) True (k) True 
(1) False 


Exercise Set 7.4 (page 436) 

Maximum: 5 at (x, y ) = (±1, 0); minimum: —1 at (x, y) = (0, ±1) 

Maximum: 7 at (x, y) — (0, ±1); minimum: 3 at ( x , y) = (±1, 0) 

Maximum: 9 at ( x , y, z) = (±1, 0, 0); minimum: 3 at ( x , y, z) = (0, 0, ±1) 

Maximum: y/2 at (x, y) = ^ *J2 , 1^ and (x, y) = ~J2, — 1^; minimum: —y/2 at (_v , y) = V2, 1^ and (x, y) = 2 , — 1^ 




13. Saddle point at (0, 0); relative maximum at (—1, 1) 

Relative minimum at (0, 0); saddle point at (2, 1); saddle point at (—2, 1) 17. x = y = q(x) = A. 


True/False 7.4 

(a) False (b) True (c) True (d) False (e) True 


Exercise Set 7.5 (page 443) 






1 i 2-3 f 


’ -2 i 4 

5 - i 



1 . 

+ 

1 

0 

3. 

-i -3 1 





2 + 3/ 1 2 


5 (a) (A) 13 £ (A*) 13 (b) (A) 22 / (A*) 22 


A -1 = 


3 

5 

L-5* 


_ 4 
5 

■f'j 


11. A~ l = 


L+('+^ 5 ) 


p = 


~ — 1+i 

l-in 


"3 

0 

O 

1 

V3 

1 

V6 

2 

"0 

1 

+1 

II 

- V3 

V6J 



J 


A40 Answers to Exercises 



r -l-i 

1 +in 





" 0 

0 

r 



'-2 

0 

O' 

p = 

V6 

2 

V3 

1 

; P-'AP = 

2 

0 

0 

g 

p = 

l-i 

V3 

-1+i 

V6 

0 

; p~ 

l AP = 

0 

1 

0 


L V6 

V3 J 




1 

2 

0 



0 

0 

5 








_ V3 

V6 








0 

K) 

1 

U) 


- 1 _ i ~ 

19. 

i 0 1 

27. (c) B and C must commute 

V2 -s/2 

1 1 


1 

1 

1 

OJ 

1 

1 


L 72 


True/False 7.5 

(a) False (b) False (c) True (d) False (e) False 


Chapter 7 Supplementary Exercises (page 445) 








4 

0 

3“ 

-1 

- 4 

9 

12“ 

-3 

4- 

-1 

- 3 

4 - 


5 

5 


5 

25 

25 

5 

4 

5 

3 

= 

5 

4 

5 

3 

(b) 

9 

25 

4 

5 

12 

25 

= 

0 

4 

5 

3 

5 

-5 

5- 


- 5 

5- 


12 

3 

16 


3 

12 

16 







- 25 

5 

25- 


- 5 

25 

25- 



1 - 1 

V 2 

1 

V2 

O' 


"0 

0 

o' 

p = 

0 

0 

1 

; p t ap = 

0 

2 

0 


1 

- -s/2 

1 

V 2 

0 


_o 

0 

1 _ 


7 Positive definite 9. (a) Parabola (b) Parabola 

Two possible solutions: a = 0, Z? = J\<c= — and a = 0, b = —J c = 


Exercise Set 8.1 (page 456) 


1. (a) Nonlinear 

(b) Linear; kernel consists of all matrices of the form 

(c) Linear; kernel consists of all matrices of the form 

3. Nonlinear 5. Linear; kernel consists of all 2 x 2 matrices whose rows are orthogonal to all columns of B 
(a) Linear; ker(T) = {0} (b) Nonlinear Linear; ker(T) = {(0, 0, 0, . . .)} (a) and (d) 


a b 
c —a 
0 b 
-b 0 


(a) 2 (b) 4 (c) inn- 3 (d) 1 


(a) 



6 

9 


(b) rankCT) = 4; nullity (T) = 0 


(a) (1,0,1) (b) ker(T) = (0) (c) R(T) = R 3 1 r(xi, x 2 ) = (-4xi + 5*2, Xi - 3x 2 ); T(5, -3) = (-35, 14) 


T (jti, X 2 , X 3 ) = (— x\ + 4x7 — X 3 , 5xi — 5 x 2 — X 3 , Xi + 3 x 3 ); r(2, 4, — 1) = (15, — 9, — 1) 


23. (b) {x,x 2 } (c) (5,x 2 ) 

25. (a) ker(D) consists of all constant polynomials 

(b) ker(7) consists of all polynomials of the form c/jx 

(a) T (/ (x)) = / (4) (x) 

(b) T (/ (x)) = / (n+1) (x) 


Answers to Exercises A41 


29. (a) The origin, a line through the origin, a plane through the origin, or the entire space R 3 
(b) The origin, a line through the origin, a plane through the origin, or the entire space R 3 

31. (-10, -7,6) 

True/False 8.1 

(a) True (b) False (c) True (d) False (e) True (f) True (g) False (h) False (i) False 


Exercise Set 8.2 (page 464) 

(a) ker(T) = (0); T is one-to-one (b) ker(T) = (0); T is one-to-one (c) ker(T) = (span(0, 1. 1)}; T is not one-to-one 
(a) nullity(A) = 1; not one-to-one (b) nullity(A) = 1; not one-to-one 

5. (a) One-to-one (b) One-to-one (c) Not one-to-one 

For example, T(l — x 2 ) = (0, 0); T is onto 

No; T is not one-to-one because ker(T’) ^ (0) as T(a) = a x a = 0 

(T 2 o Ti)(x, y) = (2jc — 3y, 2x + 3 y) 13 (7) o T 2 ° 7)) (. x , y) = (3x — 2y, x ) 

(a) a + d (b) (T 2 o 7)) (A) does not exist because 7) (A) is not a 2 x 2 matrix 
17. a ox + aix (x + 1) + a 2 x (x + l) 2 
(a) (1,-1) (d) r-'(2, 3) =2 + x 



21 (a) all the at ’s must be nonzero 


(b) T l (x u x 2 , ...,x n )= ^4-oci, ^x 2 . 



(a) Tf 1 ( p (x)) = ip (x); T 2 -'(p(x)) = p(x - 1); (Tf 1 o T 2 l )(p(x)) = \p(x - 1) 
r 2 (v) = Since ker(/) 7 ^ (0), J is not one-to-one. 


True/False 8.2 

(a) True (b) False (c) True (d) True (e) False (f) True 


Exercise Set 8.3 (page 471) 


1. Isomorphism 3, Isomorphism 5 Not an isomorphism 

a 



/ 

a b c 

\ 

(a) T 


b d e 



V 

_c e f _ 

) 





7. Isomorphism 




a 

c 


b 

d. 


Isomorphism dim ( W) = 3; (— r — s — t, r, s, t) — »■ (r, s, t) is an isomorphism between W and R 3 

15. Isomorphism 17. Yes 19. No 


True/False 8.3 

(a) False (b) True (c) False (d) True (e) True (f) True 


A42 Answers to Exercises 


Exercise Set 8.4 (page 479) 


1 (a) 


(a) [r(vi)] a = 


(c) T 


-0 

0 

0“ 




r 









1 

-1 

1 


0 

o" 


1 

1 

1 

1 

0 

0 






1 

1 





0 

1 

0 

(a) 

0 

1 

-2 

(a) 

~2 

(a) 

0 

2 

4 


_0 

0 

1_ 


8 

4 


_0 

0 

4 

.0 

0 

1_ 






_ 3 

3 _ 






(b), (c) 3 3- 1 Ox 3- 16x 2 


1 

-2 


; [7'(v 2 )] b = 


Xl 


18 

7 

1 - 

7 


Xl 

X 2 

)- 

107 

7 

24 

7 - 


X 2 


(b) r(vi) = 
(d) T ; | 


3 

-5 
r " 19" 

7 
83 


; r(v 2 ) = 


-2 

29 



T 


3' 


"-r 

(a) [T(v,)] b = 

2 

; [T(y 2 )] B = 

0 

; [7\V3)] B = 

5 


6 


-2 


4_ 


(b) T(m) = 16 + 51x + 19x 2 ; T(y 2 ) = -6 - 5x + 5x 2 ; T(y 3 ) = 7 + 40x + 15x 2 


(c) T(a 0 + a\X + a 2 x 2 ) = 


239a 0 — 161ai 3 - 289a 2 20 lao — 1 1 Icq -l - 247a 2 61 ciq — 3 luq -}- 1 07 ci 2 2 

2 3 x 3 — x 


24 


12 


(d) T{ 1 3- x 1 ) = 22 4- 56x3- 14x 2 


(a) [T 2 o 7\] . = 


(a) m B B - = 


17. (a) 


‘0 

0" 






'0 

0 

0" 




'2 


O' 





6 

0 






3 

0 

0 

0 

-9 

> [^l] B ", B — 

0 


3 


lT 2 ] B 'y = 

0 

3 

0 




_0 


0 _ 





.0 

0 _ 






.0 

0 

3_ 

0 

0" 


“1 

1 

r 




1 

1 


1 

2 

2 


’2 

5" 


-1 

1 

; m B , B " = 

1 

0 



(b), (c) 

1 



2 


2 


0 

0 _ 


_1 

1 

1 . 





(b) B = [T 2 ] B , B 


'0 

1 

0" 



"0 

0 

o' 

0 

0 

2 

(b) -6 3- 48x 

(a) 

0 

0 

-1 

_0 

0 

0_ 



_0 

1 

0_ 


(b) 4 sin x 3- 3 cos x 


(a) \T 2 o Tdj' B = [r 2 ]^ (b) [r 3 o T 2 o >a = [T } ] B ' y [T 2 ] B ^y fl 

23. The matrix for T relative to B is the matrix whose columns are the transforms of the basis vectors in B in terms of the standard 
basis. Since B is the standard basis for R n , this matrix is the standard matrix for T. Also, since B 1 is the standard basis for R m , the 
resulting transformation will give vector components relative to the standard basis. 


True/False 8.4 

(a) False (b) False (c) True (d) False (e) True 


Exercise Set 8.5 (page 486) 

(a) det(A) = —2 does not equal det(B) = — 1 (b) tr(A) = 3 does not equal tr(S) = —2 



'6 

-IO" 


"-2 -2 


'l -2 


"11 20 " 

3. 

2 

-3 

5. 

6 5 

[7 ] B - 

0 -1 

; m B - = 

-6 -11 


'-2 

-1 

o' 


"-2 

-1 

0 " 


r 1 

1 -1 


r 1 

1 -1 

1 

0 

1 

; m B ' = 

1 

0 

1 

m B = 

\/2 

1 

y/2 

1 

; m B ' = 

V2 

1 

V2 

1 

0 

1 

0_ 


0 

1 

o_ 


LV2 

VlJ 


LVI 

VlJ 


!'/']« - 


Answers to Exercises A43 


"-1 0 " 


"1 1 - 

2 2 

1 1 

; [TV = 

3 1 



-2 2 - 


15 . (a) - 4 , 3 

(b) A basis for the eigenspace corresponding to A = — 4 is (—2 + |jc 4- x 2 }; 

A basis for the eigenspace corresponding to A — 3 is (5 — 2x + x 2 } 

det(D = 17 ; eigenvalues: 5 ± 2\fl det(r) = 1 ; eigenvalue: 1 

True/False 8.5 

(a) False (b) True (c) True (d) True (e) True (f) False (g) True (h) False 


Chapter 8 Supplementary Exercises (page 488) 

1. No 

(a) T (63) and any two of T (ei), T (ei), T (e 4 ) form a basis for the range; a basis for ker(T) is 


-F 

1 

0 

1. 


(b) rankCT) = 3; nullityCT) = 1 
(a) rank (7") = 2; nullity(T) = 2 

l! rank(J) = 3; nuUity(r) = 1 


(b) T is not one-to-one 


25 . 


0 0 0 
1 0 0 
0 i 0 
0 0 | 

0 0 
0 0 0 


0 ii+T J 


-1 

0 

0 

O' 














'-4 

0 

9 ~ 


'1 

-1 

r 

0 

0 

1 

0 

15 . 

1 

0 

-2 

17 . 

0 

1 

0 

0 

1 

0 

0 


0 

1 

1_ 


1 

0 

-i 

.0 

0 

0 

1. 










(b) jl,x) 


Exercise Set 9.1 (page 499) 


xi = 2 , x 2 - 

= 1 

3 . X \ 

= 3 , x 2 = 

-1 

5 . 

Xi = 

— 1, x 2 = 1, x 2 = 

= 0 







- i 

1 

7 - 


5 

1 

7 - 


1 

0 0 ' 


2 

8 

48 


48 

48 

48 

(a) L- 1 - 

-2 

1 0 

; U~ l = 

0 

1 

4 

5 

24 

(b) A- 1 = 

7 

24 

11 

24 

5 

24 


1 

1 1 


_0 

0 

1 

6- 


1 

6 

1 

6 

1 

6- 


(a) A = LU = 


2 

-2 

2 


l 

0 

0 " 


'2 

1 


-l 

l 

0 


0 

0 

1 

l 

0 

1 _ 


_0 

0 

1 _ 


in 
' 2 

1 

1 


(b) A = L\DU\ = 


1 

-1 

1 


- 

'2 

0 

o' 



0 

1 

0 


J 

_0 

0 

1_ 



(c) A = L 2 U 2 = 


A44 Answers to Exercises 


11 v _ 21 _ 14 
1 1 X\ — ts, X2— — T,i 


v _ 12 
-* 3-17 


A = LDU = 


"l 

o" 

"2 

o" 

"l 

f 

2 

1 

0 

-3 

0 

1 



'1 0 O' 


"3 0 0" 


'1 -| o- 

A = PLU = 

0 0 1 


0 2 0 


0 1 I 


1 

O 

O 

1 


_3 0 1_ 


1 

o 

O 

1 


; jci 


= - 5 . X 2 = \ Xi= 3 


1 Approximately |n 3 additions and multiplications are required 


True/False 9.1 

(a) False (b) False (c) True (d) True (e) True 


Exercise Set 9.2 (page 508) 


(a) = -8 is the dominant eigenvalue (b) no dominant eigenvalue 



0.98058 



0.98837 

, k< 2 >« 


v ^ 
X l~ 

-0.19612 

5.15385; x 2 kb 

-0.15206 

5.16185; 


X 3 ^ 

0.98679" 

, A< 3 >« 

5.16226; x 4 rk 

0.98715" 

,k (4) R 

5.16228; 

-0.16201 

-0.15977 


dominant eigenvalue: 2 + \/T0 k« 5.16228 ; 

corresponding unit eigenvector: . 1 OvTO, — 1) kb (0.98709, —0.16018) 

V 20+6ylo 


x i = 


x 4 - 


-1 

1 


, * (1) = 6; x 2 = 


-0.53488" 

1 


, k (2) = 6.6; x 3 k 


-0.53846 

1 


,X (4) Ri 6.60555; 


dominant eigenvalue: 3+x/l3^ 6.60555 ; 

corresponding scaled eigenvector: ( 1) ~ (—0.53518, 1) 


k (3) RK 6.60550; 


l" 


r 


l 


; x 2 = 


; x 3 rk 


-0.5 


- 0.8 


-0.929 


(b) k (1) = 2.8 ; A (2) kb 2.976 ; A (3) kb 2.997 

(c) eigenvector: (1, —1); eigenvalue: 3 

(d) 0.1% 


9. 2.99993 


0.99180 

1.00000 


(a) Starting with x 0 = 


T 

0 

0 


it takes 8 iterations. 


(b) Starting with x 0 = 


T 

0 

0 

0 


it takes 8 iterations. 


Exercise Set 9.3 (page 513) 

(a) ks 0.067 second (b) ks 66.68 seconds (c) ks 66,668 seconds, or about 18.5 hours 

3. (a) kb 9.52 seconds (b) kb 0.0014 second (c) kb 9.52 seconds (d) kb 28.57 seconds 

(a) about 6.67 x 10 s seconds for forward phase; about 10 seconds for backward phase 

(b) 1334 gigaflops per second 

7. n 2 flops 9. 2/7 3 — n 2 flops 


Answers to Exercises A45 


Exercise Set 9.4 (page 520) 

1. V5, 0 3. x/5 


A = 


r 1 

-s/2 

1 "I 

s/2 

's/2 

o" 

"l 

o’ 

A = 

r 2 

s/5 

1 "I 

s/5 

"8 

o" 

r 1 

s/5 

2 “I 

v's 

1 

-s/2 

1 

s/2- 

0 

V2_ 

0 

1 


1 

-s/5 

2 

s/5 - 

0 

2 

_ 2 
- V5 

1 

s/5- 


A = 


V2 


vr 

6 


0 


2 

L "3 


V2 


s/2 

6 J 


'3V2 o' 
0 0 

0 0 


" V2 -v/2 

J_ _1_ 

L V2 V2. 


(b) A = 


V2 

0 " 

r 1 

s^ 

1 -i 

s/2 

0 

V2 _ 

1 

-s/2 

1 

s/2- 


A = 


Jb 0 


V3 

J_ 

" V3 


_1_ 

V2 

_ 1 _ 

V2 


_ 2 _- 

V6 

' v/6 

J_ 

V6- 


'•s/3 0 ' 

0 -Jl 

0 0 


1 0 
0 1 


True/False 9.4 

(a) False (b) True (c) False (d) False (e) True (f) False (g) True 


Exercise Set 9.5 (page 524) 


A = 

- 2“ 

3 

1 

3 

M[-7I 7l] 

1 

- SJ- 

0 " 

1 

V2 


i 

o ujI 

rSl o 

i 

"l o’ 

0 1 

5. A = 3\/2 

- 2" 

3 

1 

3 


2 

- 3- 


- vT 

71- 




2 

- 3- 


1! 

1 

id- 

I 

[i 

0] + V2 

' 0 " 

i 

V2 


_ 1 

L V3- 



1 

L-s/2-I 


9. 70,100 numbers must be stored; A has 100,000 entries 


s/2 


] 


True/False 9.5 

(a) True (b) True (c) False 


Chapter 9 Supplementary Exercises (page 524) 


A = 


2 

o" 

"-3 

l" 

-2 

1 

0 

2 


A = 


'2 0 O' 


"12 3 

1 2 0 


0 1 2 

1 

K> 

1 


1 

O 

O 
1 


5 (a) dominant eigenvalue: 3, corresponding positive unit eigenvector: 
(b) x 5 


•s/2 

VzJ 


(c) x 5 


; v i 


0.7071 

0.7071 


0.7100 
0.7042 
"l 

0.9918 

7 The Rayleigh quotients will slowly converge to the dominant eigenvalue k 4 = 


's/2 


"s/2 


V2 

0 

J_ 

' y/2. 



"2 

0" 

r 


0 

0 



o 

1 

o 

- 


' V2 
'V2 


' V2 

-s/2 - 


11. A = 


1 n 

2 

1 

' 2 
1 

' 2 
\_ 

2-1 


- 8 . 1 . 

24 O' 
0 12 


2 

3 

2 

L 3 


2" 

3 

1 

3 - 


9. A = 


A46 Answers to Exercises 


Exercise Set 10.1 (page 532) 

(a) y = 3x — 4 (b) y = — 2x + 1 

(a) x 2 + y 2 - Ax - 6y + 4 = 0 or (x - if + (y - 3) 2 = 9 (b) x 2 + y 2 + 2x - Ay - 20 = 0 or (x + l) 2 + (y - 2] 

3. x 2 + 2xy + y 2 — 2x + y = 0 (a parabola) (a) x + 2y + z = 0 (b) — x + y — 2z + 1 = 0 


(a) 


x y 

Z 

0 


Xi yi 

Z 1 

1 

1 

= 0 

*2 yi 

Z2 


_x 3 y 3 

Z 3 

1_ 


+ 

+ 

1 

2x 

— 4y 


10 . 


(b) 

x 2 + y 

2 + 

y 

X 2 

X 

yi 

x? 

Xi 

yi 

x| 

x 2 

yi 

x 3 2 

x 3 


(b) x + 2y + z = 0; — x + y — 2z = 0 


z 2 - 2x - 2y = 3 or (x - l) 2 + (y - l) 2 + z 2 = 5 
1 
1 
1 
1 


= 0 The equation of the line through the three collinear points 0 = 0 


13. The equation of the plane through the four coplanar points 


Exercise Set 10.2 (page 539) 

1. 700 2, (a) 5 (b) 4 

(a) Ox, || units; sheep, || unit 

(b) First kind, |= measure; second kind, 2L measure; third kind, 4L measure 


(a) xi = 


(tf? A- ci 3 A- A- d n ) — cii 
n — 2 


, Xi = a, — xi, i = 2, 3 « 


(b) Exercise 7(b); gold, 30 4 minae; brass, 94 minae; tin, 14| minae; iron, 54 


minae 


(a) 5x + y + z — K = 0 
x + 7y + z- X = 0 
x + y + 8z — X = 0 

x = , y = , z = , K = t where f is an arbitrary number 

(b) Take t = 131, so that x =21, y = 14, z = 12, K = 131. 

(c) Take t = 262, so that x = 42, y = 28, z = 24, X = 262. 

(a) Legitimate son, 577 1 staters; illegitimate son, 422 1 staters 

(b) Gold, 30( minae; brass, 9| minae; tin, 14( minae; iron, 5| minae 

(c) First person, 45; second person, 37 third person, 22^ 


Exercise Set 10.3 (page 549) 

(a) S(x) = — ,12643(x - ,4) 3 - .2021 l(x - .4) 2 + ,92158(x - .4) + .38942 

(b) S(. 5) = .47943; error = 0% 

(a) The cubic runout spline (b) S(x) = 3x 3 — 2x 2 + 5x + 1 

I — .00000042(x + 10) 3 + .000214 (jc + 10) + .99815, -10 < x < 0 

.00000024O) 3 - ,0000126(x) 2 + ,000088(x) + .99987, 0 < x < 10 

— .00000004(x - 10) 3 - .0000054(x - 10) 2 - ,000092(x - 10) + .99973, 10 < x < 20 

.00000022(x - 20 ) 3 - ,0000066(x - 20) 2 - ,000212(x - 20) + .99823, 20 < x < 30 

Maximum at (x, S(x)) = (3.93, 1.00004) 

| .00000009(x + 10) 3 - .0000121 (x + 10) 2 + ,000282(x + 10) + .99815, -10 < x < 0 
.00000009(x) 3 - .0000093(x) 2 + ,000070(x) + .99987, 0 < x < 10 

.00000004(x - 10) 3 - ,0000066(x - 10) 2 - ,000087(x - 10) + .99973, 10 < x < 20 

.00000004(x - 20) 3 - ,0000053(x - 20) 2 - ,000207(x - 20) + .99823, 20 < x < 30 

Maximum at (x, S(x)) = (4.00, 1.00001) 


i 2 = 25 


Answers to Exercises A47 


(a) S(x) = 


I -4x 3 + 3x 

(4x 3 - Ux 2 + 9x - 1 


0 < x < 0.5 
0.5 < x < 1 


(b) S(x) = 


1 2 - 2x 
[2 - 2x 


0.5 < x < 1 
1 < x < 1.5 


(c) The three data points are collinear. 


(b) 


(b) 


'4 

1 

0 

0 • 

• 0 

0 

0 

f 


' Mi ~ 


y„~i - 2y t + y 2 


1 

4 

1 

0 • 

• 0 

0 

0 

0 


m 2 


yi ~ 2 y 2 + y 3 


0 

1 

4 

1 • 

• 0 

0 

0 

0 


m 3 

6 

y 2 - 2y 3 + y 4 












~ h 2 



0 

0 

0 

0 • 

• 0 

1 

4 

1 


Mn- 2 


Vn-3 — 2y„_ 2 + y „- 1 


1 

0 

0 

0 • 

• 0 

0 

1 

4 




Jn-2 — 2y n _i + yi _ 


'2 

1 

0 

0 • 

• 0 

0 

0 

f 


' Mi ' 


- hy[ - yi + yi 

1 

4 

1 

0 • 

• 0 

0 

0 

0 


m 2 


yi - 2 y 2 + y 3 

0 

1 

4 

1 • 

• 0 

0 

0 

0 


m 3 

6 

y 2 - 2y 3 + y 4 











“ h 2 



0 

0 

0 

0 . 

. 0 

0 

4 

1 


M„_ i 


y„- 2 - 2y„_i + y 


_0 

0 

0 

0 . 

. 0 

1 

1 

2 


_ M„ _ 


v„-i - y„ + hy’ n _ 


Exercise Set 10.4 (page 559) 


(a) x (1) = 

~ A 

, x< 2 > = 

’.46’ 

, x< 3 > = 

’.454’ 

, x (4) = 

’.4546’ 

, x< 5 > = 

’.45454" 

.6 


.54 


.546 


.5454 


.54546 


(b) P is regular since all entries of P are positive; q = 




'.7' 


'.23' 


'.273' 


- 22 - 

72 

(a) x (1) = 

.2 

, x< 2 > = 

.52 

, x< 3 > = 

.396 

(b) P is regular, since all entries of P are positive: q = 

29 

72 


.1 


.25 


.331 


21 

- 72 - 



" 9 “ 


- 26 " 


- 3 - 

19 

(a) 

17 

8 

- 17 - 

(b) 

45 

19 

- 45 - 

(C) 

4 

19 

12 

- 19 - 


(a) P" = 


(O' 

-(O' 


, n = 1,2, 


Thus, no integer power of P has all positive entries. 


(b) P n 


0 

1 


0 

1 


as n increases, so P"x roi 


0 

1 


for any x (0) as n increases. 


(c) The entries of the limiting vector 


are not all positive. 



- 1 1 1 " 

2 4 4 


~ 1 “ 

3 

p 2 = 

1 1 1 

4 2 4 

has all positive entries; q = 

1 

3 


1 1 1 

-4 4 2- 


1 

- 3 - 


8. 54^% in region 1, 16|% in region 2, and 29 |% in region 3 


A48 Answers to Exercises 


Exercise Set 10.5 (page 568) 



0 

0 



'0 

1 

1 

0 

o' 

'0 

r 







0 

0 

0 

0 

1 

1 

0 

1 

i 







1 

1 

0 

i 

(b) 

1 

0 

0 

1 

0 





0 

0 

1 

0 

0 

0 

0 

0 

0 


_0 

0 

1 

0 

0_ 





0 10 10 0 
1 0 0 0 0 0 
0 10 111 
0 0 0 0 0 1 
0 0 0 0 0 1 
0 0 10 10 




P* 

Pa 


4 . (a) 


1 

0 

0 

0 

0 


0 0 0 
1 0 0 
0 1 1 
0 1 2 
0 0 1 


o' 

0 

0 

1 

2 


(c) The i yth entry is the number of family members who influence both the ith and / th family members. 


5 . (a) [P, , P 2 , P 3 } (b) [P 3 , P 4 , P-\ (c) {P 2 , P 4 , P 6 , P 8 } and [P 4 , P 5 , P 6 ) 6 . (a) None (b) [P 3 , P 4 , P 6 ) 


'o o i r 
10 0 0 
0 10 1 
0 10 0 


Power of Pi = 5 
Power of P 2 = 3 
Power of P 3 = 4 
Power of P 4 = 2 


8. First, A; second, B and E (tie); fourth, C; fifth, D 


Exercise Set 10.6 (page 578) 

1. (a) -5/8 (b) [0 1 0] (c) [10 0 Of 2. Let A = 


1 1 
1 1 


for example. 


(a) p* = [0 1], q* = 

(c) p* = [0 0 1], q* = 

4 (a) p* = [1 |], q* = 


, v = 3 (b) p* = [0 1 0], q’ = 

v = 2 (d) p* = [0 1 0 0], q’ = 


, V = 2 


, v = —2 


= f ( b >p* = [f \l f = 


i- 

6 

5 

L 6 J 




Answers to Exercises A49 


(c) p* = [1 0], q* 


1 

0 


u = 3 (d)p* = [§ §], q* 


Ce)P* = [s If]. 1* 

P* = [| Tol <1* = 


13 

_ 12 

- 13 - 

" 11 “ 

20 

A ’ 

- 20 - 


, V = — 


V = — 


_ 3 _ 

20 


29 

13 



U = 


19 

5 


Exercise Set 10.7 (page 586) 


(a) 

~2~ 

72 

(b) 

'6' 

5 

(c) 

"78" 

54 


■J 


6 


79 


2. (a) Use Corollary 10.8.4; all row sums are less than one. 

(b) Use Corollary 10.8.5; all column sums are less than one. 



"2" 


"1.9" 

(c) Use Theorem 10.8.3, with x = 

1 

> Cx = 

.9 


1 


.9 


3. E 2 has all positive entries. Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67 
$1256 for the CE, $1448 for the EE, $1556 for the ME 6 (b) f§ 


Exercise Set 10.8 (page 594) 

1 The second class; $15,000 2. $223 3. 1 : 1.90 : 3.02 : 4.24 : 5.00 

s /(gi * + Si * + ' " + g t -i) l:2:3:-:n — 1 


Exercise Set 10.9 (page 601) 




'0 

1 

1 

o' 


(a) 

0 

0 

1 

1 

(b) 


0 

0 

0 

0 



(c) 


-2 

-1 

3 


-1 

-1 

3 


(d) 


0 

0 

0 


.866 

-.500 

0 



1 


A50 Answers to Exercises 


- 2-1012 


(b) (0,0,0), (1,0,0), (li,l,0), and (±,1,0) 

(c) (0,0,0), (1, .6, 0), (1,1.6, 0), (0,1,0) 




















"l 

0 

o' 


'-1 

0 

o' 

(a) 

0 

-1 

0 

(b) 

0 

1 

0 


0 

0 

1 


0 

0 

1 




1 

0 

0" 


-1 

1 

1- 



0 

0 

2 


2 

2 

2 


0 

2 

0 

, M, = 

0 

0 • 

• 0 

, Mi = 

0 

cos 20° 

— sin 20° 

0 

0 

1 

3- 


_0 

0 • 

• °_ 


_o 

sin 20° 

cos 20° 



cos( — 45°) 

0 

sin(— 45°) 


"o 

-1 

o' 

M 4 = 

0 

1 

0 

, M s - 

1 

0 

0 


— sin(— 45°) 

0 

cos( — 45°) 


0 

0 

1 


(b) P' = M 5 M 4 Mi(MiP + M 2 ) 



'.3 

0 

o' 


"l 

0 

0 


"l 

1 • 

• l" 

(a) M x = 

0 

.5 

0 

, m 2 - 

0 

cos 45° 

— sin 45° 

, Mi - 

0 

0 • 

• 0 


0 

0 

1 


0 

sin 45° 

cos 45° 


0 

0 • 

• 0 




cos 35° 

0 

sin 35° 



cos(— 45°) 

- sin(— 45°) 

o' 

m 4 = 


0 

1 

0 

, M s 

= 

sin(— 45°) 

cos(— 45°) 

0 


- 

sin 35° 

0 

cos 35° 




0 

0 

1 


"o 

0 • • 

o' 


'2 

0 

o' 




m 6 = 

0 

0 • • 

0 

, M 7 = 


0 

1 

0 





1 

1 ■ • 

1 


0 

0 

1 





(b) P' = M 7 (M 5 M 4 (M 2 Mi,P + Mi) + M 6 ) 



cos/S 

0 

sin /I 


cos a 

— sin a 

o' 

R t = 

0 

1 

0 

, Ri = 

sin a 

cos a 

0 


— sin 

0 

cos 


0 

0 

1 



cos# 

0 

sin # 


cos a 

sin a 

0 


COS P 

0 

Ri = 

0 

1 

0 

, R4 = 

— sin a 

cos a 

0 

, Rs = 

0 

1 


— sin# 

0 

cos# 


0 

0 

1 


sin fi 

0 


'1 

0 

0 

x 0 


'1 

0 

0 

-5' 

0 

1 

0 

yo 

(b) 

0 

1 

0 

9 

0 

0 

1 

Zo 

0 

0 

1 

-3 

_0 

0 

0 

1 _ 


_0 

0 

0 

1 _ 


(a) M = 


Answers to Exercises A51 


Exercise Set 10.10 (page 611 ) 


1 (a) 


h 
h 
h 
M. 

(C) t® = 


(d) for t\ and 1 3 , —12.9%; 
for t 2 and U, 5.2% 


+ 


(b)t = 


r 4 ~ ' 

3 

4 

\ 

4 


L 

4 4 


J L*4J 

1-2- 

1-4 


-15- 

64 


"0" 


- 1 - 

8 


- 3 - 

16 


- / - 

32 



64 

1 


5 


11 


23 


47 


1 

2 

0 

, t® = 

8 

1 

, t (3) = 

16 

3 

t® = 

32 

7 

, t® = 

64 

15 

, t®-t = 

IS -1 

1 


8 


16 


32 


64 


64 

1 


5 


11 


23 


47 


1 

-2- 


-8- 


-16- 


-32- 


1—64—1 


- 64- 


2 . 


t (1) = [| 

t (2) = [g 


16 


13 

16 


1_ 

16 


If 

1 


8 J 


Exercise Set 10.11 (page 622) 

(c) x\ = (§, I) 

(a) xf = (1.40000, 1.20000) (b) Same as part (a) 

xf = (1.41000, 1.23000) 
x® = (1.40900, 1.22700) 
x® = (1.40910, 1.22730) 
x® = (1.40909, 1.22727) 
x® = (1.40909, 1.22727) 
x( = (1, 1), x( = ( 2 , 0 ), x* = (1, 1) 

x 1 4-Xg +x 9 = 13.00 8. 

x 4 + xs + Xf, = 15.00 
X\ X 2 4“ x 3 = 8.00 
.82843 (jc 6 4- xg) + ,58579x 9 = 14.79 
1.41421(x 3 4- *5 4-x 7 ) = 14.31 
,82843(x 2 4- x 4 ) 4- .58579.X! = 3.81 
x 3 4~ Xq 4~ x 9 = 18.00 
X2 4- X5 4- xg = 12.00 
X\ 4~ x 4 4~ X7 = 6.00 
.82843(x 2 4-x 6 ) 4- .58579x 3 = 10.51 
1.41421(xi 4-x 5 4-x 9 ) = 16.13 
. 82843 (x 4 4- x 8 ) 4- .58579x ? = 7.04 


(c) x® = (9.55000, 25.65000) 
xf = (.59500, -1.21500) 
x® = (1.49050, 1.47150) 
x® = (1.40095, 1.20285) 
x® = (1.40991, 1.22972) 
xf = (1.40901, 1.22703) 


xj + xg 4- x 9 = 13.00 
x 4 4- X 5 4- x 6 = 15.00 
X\ 4~ x 2 4~ x 3 = 8.00 
,04289(x 3 4- x 5 4- x 7 ) 4- .75000(x 6 4- x 8 ) 4- .61396x 9 = 14.79 
,91421(x 3 4- X 5 4- x 7 ) 4- .25000(x 2 4- x 4 4- X 6 4- xg) = 14.31 
.04289 (x 3 4- X 5 4- x-j) 4- .75000(x 2 4- x 4 ) 4- .61396xi = 3.81 

x 3 4~ Xf, 4~ x 9 — 18.00 
X 2 4- X 5 4- xg = 12.00 
X\ 4~ x 4 4~ x 7 = 6.00 
.04289(x t 4- x 5 4- x 9 ) 4- .75000(x 2 4- x 6 ) 4- .61396x 3 = 10.51 
.9 1421 (xj 4~ X 5 4- x 9 ) 4~ .25000(x 2 4~ x 4 4~ xg 4~Xg) = 16.13 
,04289(x! 4- X 5 4- x 9 ) 4- .75000(x 4 4- x 8 ) 4- .61396x 7 = 7.04 


Exercise Set 10.12 (page 637 ) 


1 , T t 


X 

\ 12 

’1 0" 


X 


ei 






4- 


_y_ 

/ 25 

0 1 


y 

ji_ 


i = 1, 2, 3, 4, where the four values of 


e, 

are 

0 

f 


0 


13 

25 


0 


0 


13 

25 

, and 


d H {S) = ln(4)/ln (f§) = 1.888... 


A52 Answers to Exercises 


2 s i=s .47; d H (S) & ln(4)/ ln(l/.47) = 1.8. . . . Rotation angles: 0° (upper left); 

—90° (upper right); 180° (lower left); 180° (lower right) 

3. (0, 0, 0), (1.0, 0), (2, 0, 0), (3, 0, 0), (0, 0. 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 

(a) (i) s — (ii) all rotation angles are 0°; (iii) d H (S) = ln(7)/ ln(3) = 1.771. . . . 

This set is a fractal. 

(b) (i) s = i; (ii) all rotation angles are 180°; (iii) du(S) = ln(3)/ln(2) = 1.584. . . . 

This set is a fractal. 

(c) (i) 5 = |; (ii) rotation angles: —90° (top); 180° (lower left); 180° (lower right); 

(iii) d H (S ) = ln(3)/ In (2) = 1.584. . . . This set is a fractal. 

(d) (i) s = 1; (ii) rotation angles: 90° (upper left); 180° (upper right); 180° (lower right); 

(iii) d H (S ) = ln(3)/ ln(2) = 1 .584. . . . This set is a fractal. 

s = .8509 .... 9 = —2.69° . . . (0.766, 0.996) rounded to three decimal places dn(S) = ln( 1 6)/ ln(4) = 2 

ln(4)/ln (|) = 4.818 . . . dn(S) = ln(8)/ ln(2) = 3; the cube is not a fractal. 
k = 20; s = d H (S) = ln(20)/ ln(3) = 2.726 . . .; the set is a fractal. 

Area of 5 0 = 1; area of 5, = | = 0.888 . . . ; area of S 2 = (§) 2 = 0.790 . 
area ofS 3 = (|) 3 = 0.702 ...; area ofS 4 = (|) 4 = 0.624... 

Initial set 



■ ■ ■ ■ Third iterate 

Fourth iterate 

d H (S) = ln(2)/ln(3) = 0.6309. . . 


Exercise Set 10.13 (page650) 

11(250) = 750, fl(25) = 50, 11(125) = 250, 11(30) = 60, 11(10) = 30, 11(50) = 150, 
11(3750) = 7500, n(6) = 12, 11(5) = 10 

One 1-cycle: {(0, 0)}; one 3-cycle: { (| , 0) , (|, |) , (0, |)}; 

two 4-cycles: { ( | , 0) , (£, |) , (§, 0) , (§, §)} and {(0, f) , (f, |) , (0, <) , (|, §)}; 

two 12-cycles: {(0, §) , (I, f) , (|, f) , (|, |) , (§, . (*, |) , (0. |) , (f, |) , (|, |) , 

(|, §),(!,!),(§,i)}and{(I,0), (I, I), (!,!),(!,!), (I, |),(f, I), (|,0), 

(!-f)>(UMITMf> !)■(§-!)}■ n <6) = 12 

(a) 3,7, 10,2, 12, 14, 11, 10,6, 1, 7, 8,0, 8, 8, 1. 9, 10,4, 14, 3,2, 5,7, 12,4, 1, 5, 6, 11. 

2, 13,0, 13, 13, 11,9,5, 14.4,3,7,... 

(c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5), . . . 

(c) The first five iterates of (^,0) are (4, 4). (4, 4)' (w- w)’(w’ M)- and (f - w)- 


Answers to Exercises A53 


6 (b) The matrices of Anosov automorphisms are 


3 2 

1 1 


and 


5 7 
2 3 


(c) The geometric effect of this transformation is to rotate each point in the interior of 5 clockwise by 90° 


about the center point 
(0,1) (1,1) 


of 5. 


( 0 , 1 ) ( 1 / 2 , 1 ) ( 1 , 1 ) 



and (|, |) form one 2-cycle, and (f . 5) and (f , |) form another 2-cycle. 

14. Begin with a 101 x 101 array of white pixels and add the letter ‘A' in black pixels to it. Apply the mapping to this image, which will 
scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. Apply the mapping 
again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure with the letters ‘D’ and 
‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters ‘B’ through ‘E’ scattered in 
the background. Four subsequent applications of T to this image will produce the remaining images. 


Exercise Set 10.14 (page 662) 


(a) GIYUOKEVBH 

(b) SFANEFZWJH 


(a) A- 1 = 

’l2 

23 

7 

15 

(b) Not invertible (c) A -1 = 

’ 1 19 

23 24 

II 

7 

S 

’l5 

21 

12" 



5 




(d) Not invertible (e) Not invertible 



’7 15’ 


’7 5" 

3 . WE LOVE MATH Deciphering matrix = 

6 5 

; enciphering matrix = 

2 15 


5 . THEY SPLIT THE ATOM 6 . I HAVE COME TO BURY CAESAR 


(a) 010110001 


(b) 


1 

1 

1 


8. A is invertible modulo 29 if and only if det(A) ^ 0 (mod 29). 


Exercise Set 10.15 (page672) 


= \ + (^)" +1 (flo - C 0 ) 


2 . 


K = \ 


c„ = i-(I)" +1 ( flo - Co ) 


n = 1 , 2 , 


On \ 

b„ = J 

C i 

c n ^ 4 


A54 Answers to Exercises 


a i"+i - - + -b 0 - 4 c 0 ) 

1 1 

bl " +> ~ 3 “ 6(4r (2 " 0 “ b ° ~ 4c ' o) 

C’n+l = 0 


a 2n - p + 


bln = T 


C2 ' ! “ 12 ~ 


1 

6(4)" 


(2a 0 - feo - 4c 0 ) 


1 

6(4)" 


(2a 0 - b 0 - 4c 0 ) 


n = 1,2, ... 


Eigenvalues: 7.[ = 1, >.2 = eigenvectors: ei = 

5. 12 generations; .006% 

l ~ + y ^[(-3- V , 5)(l + V^r +1 + (-3 + V^)(l-V^)" +1 ] 

^■^rK 1 +V5)" +I + (i-V5r +1 ] 
v ^[a + Vsr + a-Vsn 

6. x (n) = J 

^■^[n + Vsr + a-Vsr] 

^■^[(1 + V'5)" +i + (1-n/5)" +1 ] 

\ + ^ • ^[(-3 - V5)(l + n / 5)" +1 + (-3 + V5)(l - x/5)" +1 ] 

'1 0 0 0 ' 

0 0 0 0 

8 . 

0 0 0 0 
0 0 0 1 



Exercise Set 10.16 (page681) 



T 


’ioo" 


"l75" 

(a) A, = §, x, = 

1 

-3- 

(b) x (1) = 

50 

, x (2) = 

50 


(c) x (6) ^ Lx (5) 


857 

285 


r(6) 


A 1X W 


855 

287 


*(3) 


7. 2.375 8. 1.49611 



Exercise Set 10.17 (page690) 

" 1 


(a) Yield = 33 j% of population; xi 


(b) Yield = 45.8% of population; xi 


1 

3 

j_ 

18 

r 

1 

2 


1 


; harvest 57.9% of youngest age class 
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'1.000" 


' 2.090 ' 

.845 


.845 

.824 


.824 

.795 


.795 

.755 


.755 

.699 

, Lxi = 

.699 

.626 


.626 

.532 


.532 

0 


.418 

0 


0 

0 


0 

0_ 


0 _ 


1.090+ .418 
7.584 


= .199 


h, = (R - l)/(a,bib 2 ■ ■ ■ b/_i 4 h a n bib 2 ■ ■ ■ b n - 1 ) 

+ ci 2 b\ + • • • + (aj-\b l b 2 ■ ■ ■ bj _ 2 ) — 1 

hi = 

a/bib 2 ■ ■ ■ &/- 1 4 1- aj-\bib 2 ■ ■ ■ bj_ 2 


Exercise Set 10.18 (page696) 

7T 2 4 

1 . 1- 4 cos f + cos 2r 4 — cos 3 1 

3 9 

r 2 T 2 ( 2n 1 4?r 1 6 tt 1 8 ir \ T 2 / 2ir 1 4it I 6 tt 1 81 \ 

1 — ( cos — t 4 — - cos — t 4 — - cos — t 4 — - cos — t sin — t 4 — sin — t 4 — sin — t 4 — sin — t 

3 7r 2 \ T 2 2 T V- T 4 2 T ) n \ T 2 T 3 T 4 T ) 


3. 


5. 


1 

- + 
n 

T 

4 ~ 


- sin / 

2 3 n 

cos 2 1 

15ti 

cos4t 

- 

n 

OO 

2 nt 

1 

6nt 

1 

— — — COS 

+ 

— r COS 

+ 

— , cos 

7V 2 \2 2 

T 

6 2 

T 

10 2 


1 1 1 

cos t cos 2 1 cos 3 1 — ■■ ■ 

1-3 3-5 5-7 


107Tf 

T 


+ ••• + 


1 2nnt\ 

cos 

(2 n) 2 T ) 


1 


(2 n - l)(2w+ 1) 


■ cos nt 


Exercise Set 10.19 (page 704) 

(a) Yes; v = i Vl + |t 2 + §v 3 (b) No; v = f Tl + fv 2 - ±v 3 

(c) Yes; v = fvj + |v 2 + 0v 3 (d) Yes; v = 4 V , + A Vz + ^ V3 

m = number of triangles = 7, n = number of vertex points = 7, 
k = number of boundary vertex points = 5; Equation (7) is 7 = 2(7) — 2 — 5. 

3. w = My + b = Mpqvi + c 2 v 2 + c 3 v 3 ) + (ci + c 2 + c 3 )b 

= + b) + c 2 (Mv 2 + b) + c 3 (Mv 3 + b) = ciWi + c 2 w 2 + c 3 w 3 

4. (a) V! v 2 (b) V! v 2 




A56 Answers to Exercises 



'l 

2 


T 




3 

-f 


"o’ 

(a) M = 

0 

1 

, b 

2 


(b) M = 


1 

i 

. b = 

1 


"l 

o’ 



2 



" l 

f 


l 

(c) M = 

0 

1 

. b = 



(d) M = 


2 

0 

, b = 




3 


2 


-1 


(a) Two of the coefficients are zero. (b) At least one of the coefficients is zero, 
(c) None of the coefficients are zero. 


(a) |vi + |v 2 + |v 3 


(b) 


8/3 

2 


Exercise Set 10.20 (page 712) 

1. (a) [1/5 2/5 2/5] r (b) [1/2 0 l/2f 

3 The matrix M in Equation (8) is 



i 

i ■ 

• i 


i 

i • 

• i 


i 

i • 

• i 


i 

i • 

■ i 

(1 - 5) 

M - SB + 

i 

i ■ 

• i 

s 

i 

i • 

• i 

+ 

i 

i • 

• i 

i 

i 

i • 

■ i 

n 

i 

i ■ 

• i_ 

n 

i 

i • 

• i_ 

n 

i 

i • 

• i_ 

n 

i 

i • 

■ i_ 


which has the normalized eigenvector [\/n \/n ■■■ l/w] r . Thus all pages have page rank \/n. 




i 

i • 

• i 



i 

i • 

• i 

5. 










x« = Mx (i_1) = ■ 

(1 - 5 ) 
SB + v 

i 

i • 

• i 


x (A_1) - 8Bx«-" + (1 S) 

i 

i • 

• i 


n 





n 






i 

i • 

• i 



i 

i • 

• i 


= 5ex (A ~ 1) + 


0-3) 

n 


1 

1 


1 _ 

where the last equality is true because for each row in the last matrix product we have [1 1 • • • l]x (A_l) = 1 since the sum of the 
entries of the state vector x (A_1) is 1. 

(a) Eigenvalues are 1 and —1. Eigenvector for eigenvalue 1 is [1 l] 7 . The iterates alternate between [1 0] r and [0 l] r and 


so do not converge. The total page count is 
fractional page count is 


Y 


Y 


Y 


’ 2 ’ 


’ 3 ’ 


’ 3 ’ 


k 


k 


~k+ l" 

0 

’ 

l 

’ 

i 

’ 

2 

’ 

2 

’ 

3 

’ ’ " ’ ’ 

k~ 1 

’ 

k 

’ 

k 


, . . . and the 


1 

1 

1 

1 

2 

1 

2 

1 

3 

1 

3 

1 

k 

1 

~k 

1 

~k+ l" 

0 

’ 2 

1 

’ 3 

1 

’ 4 

2 

’ 5 

2 

’ 6 

3 

" ’ 2k - 1 

k - 1 

’ 2k 

k 

’ 2k + 1 

k 


which converges to [ 1 /2 1 /2] 7 . 

(b) Eigenvalues are 1 and — 5 . Eigenvector for eigenvalue 1 is [ 1 1 ] T for any 5 . Thus both pages have the same rank (as is obvious by 


" 1-3 1 + 3 ’ 

, M k = - 

’l + {-S) k 1 - (— 5) a " 

. 1 
, and so M converges to - 

’l 

l" 

1+3 1-5 

2 

1 - (— 5) a 1 + (~S) k 

2 

1 

1 


1 

symmetry). M = - 


Therefore, for any initial vector we have that x (A) = Mx (k 11 = M k *x (1> 


as k goes to infinity. 


1 

’l 

l" 

IVY 

1 

'x? + 4 ir 

1 

Y 

2 

1 

1 


“ 2 

_X 1 (1) +4 1) _ 

“ 2 

1 


Answers to Exercises A57 



Transition matrix = 


‘0 

1/2 

1/3 

1/2- 


-4/13" 

1 

0 

1/3 

0 

; eigenvector = 

5/13 

0 

1/2 

0 

1/2 

3/13 

_0 

0 

1/3 

0 _ 


.1/13. 


11 . 



Transition matrix = 


0 1/2 0 

1 0 1/2 

0 1/2 0 

0 0 0 

0 0 0 

0 0 0 


13. 



0 

0 

0 

0 

1/2 

0 


0 


1/2 0 


1 


1/2 0 


; eigenvector = 


1 


2(n- 1) 



'0 

1/2 

0 

1/2- 


-2/8“ 

Transition matrix = 

1 

0 

0 

1/2 

1/2 

0 

1/2 

; normalized eigenvector = 

3/8 


0 

2/8 


.0 

0 

1/2 

0 _ 


-1/8. 
















INDEX 


A 

Absolute value: 
of complex number, 313, A7 
of determinant, 178 
Addition: 

associative law for, 39, 134 
by scalars, 184 

of vectors in R 2 and R 3 , 132, 134 
of vectors in R'\ 138 
Additivity property, of linear 
transformation, 448 
Adjacency matrix, 705-706 
Adjoint, of a matrix, 122-124 
Aeronautics, yaw, pitch, and roll, 263 
Affine transformations, 633-635 
contracting, 633-634 
with warps, 696 

Age-specific population growth, 671-679 
female age distribution of animals, 674 
female age distribution of humans, 
678-679 

Leslie matrix, 673, 675-679 
limiting behavior, 674-679 
Ahmes Papyrus, 532 
Algebraic multiplicity, 309-310 
Algebraic operations, using vector 
components, 138-139 
Algebraic properties of matrices, 39 — 49 
Algebraic properties of vectors, dot 
product, 147-148 

Algebraic Reconstruction Techniques 
(ARTs), 612, 615-618 
Alleles, 342 
Amps (unit), 86 
Angle: 

in R\ 148-149, 155 
between vectors, 146-149, 356-357 
Animal population harvesting, 681-687 
model for, 682-684 
only in youngest age class, 685-687 
optimal sustainable yield, 687 
uniform, 684-685 
Anosov automorphism, 648-649 
Anticommutativity, 325 
Antihomogeneity property, of complex 
Euclidean inner product, 316 
Antisymmetry property: 
of complex Euclidean inner product, 
316 

of dot product, 316 
Approximate integration, 93-94 
Approximations, best, 379-380 
Approximation problems, 394—396 
Archimedes, 534-535 


Area: 

of parallelogram, 176 
of triangle, 176-177 

Argument, of complex number, 314, A8 
Arithmetic average, 347 
Arithmetic operations: 
matrices, 27-35, 39-43 
vectors in R 2 and R 3 , 132-134 
vectors in R", 137-139 
Arnold, Vladimir I., 638 
Arnold’s cat map, 638-640, 644—646 
Artificial intelligence, 493 
ARTs (Algebraic Reconstruction 
Techniques), 612, 615-618 
Associative law for addition, 39, 134 
Associative law for matrix multiplication, 
39, 40^11 

Astronautics, yaw, pitch, and roll, 263 
Augmented matrices, 6-7, 11, 12, 18, 25, 
34 

Autosomal inheritance, 661-665 
Autosomal recessive diseases, 665-666 
Axes: 

rotation of, in 2-space, 404^106 
rotation of, in 3-space, 406^107 
Axis of rotation, 262 

B 

Babylonia, early applications in, 532-533 
Back-substitution, 19-20 
Backward phase, 1 5 
Bakhshali Manuscript, 536 
Balancing (of chemical equation), 89 
Barnsley, Michael, 622, 632, 634 
Basis, 221-223 
change of, 229-234, 482-184 
coordinate system for vector space, 
214-216 

for eigenvectors and eigenspaces, 
295-298 
finite basis, 214 
by inspection, 224-225 
linear combinations and, 245 
number of vectors in, 222 
ordered basis, 217 
orthogonal basis, 365 
for orthogonal complement, 360 
orthonormal basis, 365-367 
for row and column spaces, 241 
by row reduction, 242-244 
for row space of a matrix, 244-245 
standard basis, 214—216, 218 
transition matrix, 231-234 
uniqueness of basis representation, 216 


Basis vectors, 214, 450^151 
Bateman, Harry, 517 
Battery, 86 

Beam density, computed tomography, 614 
Begin-triangle, warps, 696 
Beltrami, Eugenio, 518 
Best approximation theorem, 379-380 
Block triangular form, 118 
Block upper triangular form, 103 
Bocher, Maxime, 7, 196 
Books, ISBN number of, 153 
Boundary data, temperature distribution, 
601-602 

Boundary mesh points, 603 
Bounded sets, 622-623 
Branches (network), 84 
Brightness, graphical images, 136 
Bunyakovsky, Viktor Yakovlevich, 149 

C 

C n , 317-320 

Calculus of variations, 174 
Cancellation law, 42 
Cantor set, 637 
Carroll, Lewis, 108 

Cat map (Arnold’s), 638-640, 644-646 
CAT scanner, 612 
Cattle Problem, 534—536 
Cauchy, Augustin, 122, 149, 184 
Cauchy-Schwarz inequality, 148-149, 
355-356 

Cayley, Arthur, 30, 35, 44 
Central conic, 421^122 
Central conic in standard position, 421 
Central ellipsoid in standard position, 428 
Central quadrics in standard position, 422 
Change-of-basis problem, 230-231, 482 
Change of variable, 419 
Chaos, 637-648 

Arnold’s cat map, 638-640, 644-646 
defined, 646 

dynamical systems, 647-648 
nonperiodic points, 645-646 
periodic points, 640-642 
period vs. pixel width, 642-643 
repeated mappings, 639-640 
tiled planes, 643-644 
Characteristic equation, 292, 306 
Characteristic polynomial, 293, 306 
Chemical equations, balancing with linear 
systems, 88-91 
Chemical formulas, 88 
Chessboard moves, 561 
China, early applications in, 533-534 


II 
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Chiu Chang Suan Shu, 533-534 

Ciphers, 650-652. See also Cryptography 

Ciphertext, 650 

Ciphertext vector, 651 

Circle, through three points, 527 

Clamped splines, 548 

Cliques, directed graphs, 562-564 

Clockwise closed-loop convention, 86 

Closed economies, 96 

Closed Leontief model, 577-581 

Closed sets, 622-623 

Closure under addition, 1 84 

Closure under scalar, 1 84 

Coefficients: 

of linear combination of matrices, 32 
of linear combination of vectors, 139, 
195 

literal, 45 

Coefficient matrices, 34, 306, 491 
Cofactor, 106-107 
Cofactor expansion: 
of 2 x 2 matrices, 107-108 
determinants by, 105-110 
elementary row operations and, 
116-117 

Collinear vectors, 133-134 
Columns, cofactor expansion and choice 
of, 109 

Column matrices, 26-27 
Column-matrix form of vectors, 237 
Column space, 237, 238, 240, 241, 
251-252 

basis for, 241, 243 

equal dimensions of row and column 
space, 248-249 

orthogonal project on a, 383-384 
Column vectors, 26, 27, 40 
Column-vector form of vectors, 140 
Column-wheel, 568 
Combustion, linear systems to analyze 
combustion equation for methane, 
88-90 

Comma-delimited form of vectors, 1 39, 
217, 237 

Common initial point, 134 
Commutative law for addition, 39 
Commutative law for multiplication, 41, 
47 

Complete reaction (chemical), 89 
Complex conjugates: 
of complex numbers, 313, A6 
of vectors, 315 
Complex dot product, 316 
Complex eigenvalues, 317-318, 320-322 
Complex eigenvectors, 317-318 


Complex Euclidean inner product, 
316-317 

Complex exponential functions, A10-A1 1 
Complex inner products, 354 
Complex inner product space, 354 
Complex matrices, 315 
Complex //-space, 314 
Complex //-tuples, 314 
Complex numbers, 313-314, A5-A1 1 
division of, A8, A9-A1 1 
multiplication of, A6, A9-A1 1 
polar form of, 314, A9-A1 1 
Complex number system, A5 
Complex plane, A6 
Complex vector spaces, 184, 313-324 
Component form, 156 
Components (of a vector): 
algebraic operations using, 138-139 
calculating dot products using, 147-148 
complex //-tuples, 314 
finding, 135-136 
in R 2 and R\ 134-135 
vector components of u along a, 
159-160 
Composition: 
with identity operator, 461 
of linear transformations, 460^161, 
463^164 

matrices of, 477-478 
of matrix transformations, 270-273 
non-commutative nature of, 27 1 
of one-to-one linear transformations, 
463^164 

of reflections, 272, 283-284 
of rotations, 271-272, 283 
of three transformations, 272-273 
Compression operator, 265, 283 
Computed tomography, 611-620 
Algebraic Reconstruction Techniques, 
615-620 

derivation of equations, 613-615 
scanning modes, 612 
Computers, LINPACK, 492 
Computer graphics, 593-598 
morphs, 695, 699-702 
rotation, 596-598 
scaling, 595 
translation, 596 

visualization of three-dimensional 
object, 593-595 
warps, 695-699 

Computer programs, L(/-decomposition 
and, 492 
Conclusion, A1 
Condensation, 108 
Congruent set, 622 


Conic sections (conics), 420^124 
classifying, with eigenvalues, 425^126 
quadratic forms of, 420^122 
through five points, 528-529 
Conjugate transpose, 437-438 
Consistency, determining by elimination, 
65-66 

Consistent linear system, 3^4, 238-239 
Constrained extremum, 429 — 432 
Constrained extremum theorem, 430 
Constraint, 430 
Consumption matrix, 97, 582 
Consumption vectors, 97, 98 
Continuous derivatives, functions with, 
194 

Contracting affine transformation, 
633-634 

Contraction, 264, 449 
Contraction operators: 
and fractals, 622, 623, 626-627 
for general linear transformations, 449 
Contrapositive, A2 
Convergence: 
of power sequences, 501 
rate of, 507 
Converse, A2 
Convex combination, 696 
Coordinates, 217 
of generalized point, 136 
in R 2 , 218-219 

relative to standard basis for R", 218 
Coordinate map, 229-230 
Coordinate systems, 212-214 
“basis vectors’* for, 214 
units of measurement, 213 
Coordinate vectors: 
computing, 232-233 
matrix form of, 217 
relative to orthonormal basis, 367 
relative to standard bases, 218 
Cormack, A. M., 612 
Corresponding linear systems, 169 
Cramer, Gabriel, 125 
Cramer’s rule, 125 
Critical points, 432 
Cross product, 172-179 
calculating, 173-174 
determinant form of, 175-176 
geometric interpretation of, 176-177 
notation, 173 
properties of, 174-175 
of standard unit vectors, 175-176 
Cross product terms, 418, 423-424 
Cryptography, 650-659 
breaking Hill ciphers, 657-659 
ciphers, 650-652 
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deciphering, 654-656 
Hill ciphers, 651-652, 656-659 
modular arithmetic, 652-654 
CT, See Computed tomography 
Cubic runout spline, 544-547 
Cubic spline, 541-544 
Cubic spline interpolation, 538-547 
cubic runout spline, 544—547 
curve fitting, 538-539 
derivation of formula of cubic spline, 
541-544 

natural spline, 544—545 
parabolic runout spline, 544—547 
statement of problem, 539-540 
Current (electrical), 86 
Curve fitting, cubic spline interpolation, 
538-539 

D 

Damping factor, 708 
Dangling pages, 704 
Data compression, singular value 
decomposition, 521-524 
Deciphering matrix, 657 
Decomposition: 
eigenvalue decomposition, 514 
Hessenberg decomposition, 514 
LD {/-decomposition, 498-499 
//[/-decomposition, 491 — 498, 513 
PL [/-decomposition, 499 
Schur decomposition, 514 
self-similar sets, 623 
singular value decomposition, 516-519, 
521-524 

of square matrices, 514-515 
Degenerate conic, 420 
Degrees of freedom, 222 
Demand vector, 58 1 
DeMoivre’s formula, A10 
Dense sets, in chaos theory, 645-646 
Dependency equations, 245-246 
Determinants, 45, 105-127 
by cofactor expansion, 105-110 
defined, 105 

of elementary matrices, 114—115 
equivalence theorem, 126-127 
evaluating by row reduction, 113-1 17 
general determinant, 108 
geometric interpretation of, 178-179 
of linear operator, 485 
of lower triangular matrix, 109-1 10 
of matrix product, 120-121 
properties of, 116-124 
sums of, 120 
of 3 x 3 matrices, 110 
of 2 x 2 matrices, 110 


Devaney, Robert L., 646 
Deviation, 395 

Diagonal coefficient matrices, 328 
Diagonal entries, 516 
Diagonalizability: 
defined, 303 

nondiagonalizability of n x n matrix, 

4 1 4 — 4 1 5 

orthogonal diagonalizability, 441 
recognizing, 307 
of triangular matrices, 307 
Diagonalization: 
matrices, 302-3 1 1 

orthogonal diagonalization, 409-416 
solution of linear system by, 328-330 
Diagonal matrices, 67-69, 286 
Dickson, Leonard Eugene, 123 
Difference: 
matrices, 28 
vectors, 133, 138 

Differential equations, 326-330, 454 
Differentiation, by matrix multiplication, 
468-169 

Differentiation transformation, 453 
Digital communications, matrix form 
and, 254 

Dilation, 264, 449 
Dilation operators, 449, 622 
Dimensions: 
of spans, 222 
of vector spaces, 222 
Dimension theorem, for linear 
transformations, 454^-55 
Dirac matrices, 325 
Directed edges, 559 
Directed graphs, 559-564 
cliques, 562-564 
dominance-directed, 564-566 
Direct product, 146 
Direct sum, 290 

Discrete mean-value property, 603 
Discrete random walk, 608 
Discrete-time chaotic dynamical systems, 
647 

Discrete-time dynamical systems, 647 

Discriminant, 319 

Disjoint sets, A4 

Displacement, 163 

Distance, 346 

general inner product spaces, 357 
orthogonal projections for, 160-162 
between parallel planes, 162 
between a point and a plane, 161-162 
real inner product spaces, 346 
in R". 144-145 

triangle inequality for, 149-150 


Distinct eigenvalues, 501 
Distributive property: 
of complex Euclidean inner product, 
316 

of dot product, 147-148 
Dodgson, Charles Lutwidge, 108 
Dominance-directed graphs, 564—566 
Dominant eigenvalue, 501-503 
Dominant eigenvalue, of Leslie matrix, 
675 

Dominant genes, 661 
Dot product, 145-148 
algebraic properties of, 147-148 
antisymmetry property of, 316 
application of, 153 
calculating with, 148 
complex dot product, 316 
cross product and, 173-174 
dot product form of linear systems, 
168-169 

as matrix multiplication, 150-152 
relationships involving, 173-174 
symmetry property of, 147-148, 316 
of vectors, 150-152 
Drafting spline, 539 
Dynamical system, 332-334, 647-648 

E 

Ear: 

anatomy of, 689-690 
least squares hearing model, 689-694 
Echelon forms, 11-12, 21-22 
Economics, n-tuples and, 136 
Economic modeling, Leontief economic 
analysis with, 96-100, 577-584 
Economic sectors, 96 
Egypt, early applications in, 532 
Eigenspaces, 295-296, 306, 317 
bases for, 295-298 
of real symmetric matrix, 439-440 
Eigenvalues, 291-298, 306, 317-318 
complex eigenvalues, 317-318 
conic sections classified by using, 
425^126 

dominant eigenvalues, 501-503 
of general linear transformations, 299 
of Hermitian, 439^140 
of Hermitian matrices, 442 
invertibility and, 298 
of Leslie matrix, 675-679 
of linear operators, 485 
of square matrix, 307 
of symmetric matrices, 41 1 
of 3 x 3 matrix, 293-294 
of triangular matrices, 294-295 
of 2 x 2 matrix, 319-320 
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Eigenvalue decomposition (EVD), 514 
Eigenvectors, 291-298 
bases for eigenspaces and, 295-298 
complex eigenvectors, 317-318 
left/right eigenvectors, 301 
of real symmetric matrix, 439^140 
of square matrix, 307 
of symmetric matrices, 411 
of 2 x 2 vector, 292 
Einstein, Albert, 135, 136 
Eisenstein, Gotthold, 30 
Electrical circuits: 

network analysis with linear systems, 
86-88 

n -tuples and, 136 
Electrical current, 86 
Electrical potential, 86 
Electrical resistance, 86 
Elements (of a set), A3 
Elementary matrices, 52 
determinants, 114-115 
and homogeneous linear systems, 58 
invertibility, 54 

matrix operators corresponding to, 284 
Elementary row operations, 7-8, 53-54, 
240 

cofactor expansion and, 1 16-117 
determinants and, 113-117 
and inverse operations, 54—57 
and inverse row operations, 54—57 
for inverting matrices, 56-57 
matrix multiplication, 53-54 
row reduction and determinants, 
113-117 

Elimination methods, 14-16, 65-66 
Ellipse, principal axes of, 423 
Elliptic paraboloid, 437 
Empty set, A4 
Enciphering, 650 
End-triangle, warps, 696 
Entries, 26, 27 

Equality, of complex numbers, A5 
Equal matrices, 27-28, 40 
Equal sets, A4 
Equal vectors, 132, 137-138 
Equilibrium temperature distribution, 
601-609 

boundary data, 601-602 
discrete formulation of problem, 
603-607 

mean-value property, 602-603 
Monte Carlo technique for, 608-609 
numerical technique for, 607-608 
Equivalence theorem, 384 
determinants, 126-127 


invertibility, 54-56, 298-299 
n x n matrix, 253-254, 277 
Equivalent statements, A2 
Equivalent vectors, 132, 137-138 
Errors: 

approximation problems, 395 
least squares error, 379 
mean square error, 395 
measurements of, 395 
percentage error, 507 
relative error, 507 
roundoff errors, 22 
Error vector, 381 
Estimated percentage error, 507 
Estimated relative error, 507-508 
Euclidean inner product, 346-348 
complex Euclidean inner product, 
316-317 

of vectors in R 1 or R 3 , 145 
Euclidean norm, 316 
Euclidean n -space, 346 
Euclidean scaling, power method with, 
503-504 

Euler phi functions, 661 
Euler’s formula, A 10 
Evaluation inner product, 350-351 
Evaluation transformation, 450 
EVD (eigenvalue decomposition), 514 
Exchange matrix, 579 
Expansion operator, 265, 283-284 
Expected payoff, matrix games, 570 
Exponents, matrix laws, 47 
Exponential models, 393 

F 

Factorization, 491, 494 
Family influence, 560 
Fan-beam mode scanning, computed 
tomography, 612 
Fertile age class, 672 
Fibonacci, Leonardo, 52 
Fibonacci sequence, 52 
Fibonacci shift-register random-number 
generator, 648 
Fingerprint storage, 523 
Finite basis, 214 

Finite-dimensional inner product space, 
360, 373 

Finite-dimensional vector space, 214, 
224-225, 229-230 
First-order linear system, 326-328 
Fixed points, 642 
Floating-point numbers, 509 
Floating-point operation, 509 
Flops, 509-512 

Flow conservation, in networks, 84 


Forest management, 586-592 

Forward phase, 1 5 

Forward substitution, 493 

4x6 matrix, rank and nullity of, 249-250 

Fourier, Jean Baptiste, 398 

Fourier coefficients, 397 

Fourier series, 396-398 

Fractals, 622-635 

algorithms for generating, 629-632 
defined, 626 
in Euclidean plane, 622 
Hausdorff dimension of self-similar 
sets, 625-626 

Monte Carlo approach for, 632-633 
self-similar sets, 622-624 
similitudes, 626-629 
topological dimension of sets, 624-625 
Free variables, 13, 250 
Free variable theorem for homogeneous 
systems, 18-19 
Full column rank, 375 
Functions: 

with continuous derivatives, 194 
linear dependence of, 207-209 
Function spaces, 194—195 
Fundamental spaces, 251-253 
Fundamental Theorem of Two-Person 
Zero-Sum Games, 571-572 

G 

Games of strategy: 
game theory, 568-569 
2x2 matrix games, 573-576 
two-person zero-sum games, 569-573 
Game theory, 568-569 
Gauss, Carl Friedrich, 15, 29, 106, 533 
Gaussian elimination, 11-16, 512, 513 
defined, 16 
roundoff errors, 22 
Gauss-Jordan elimination: 
of augmented matrix, 318, 513 
described, 15 

for homogeneous system, 18 
polynomial interpolation by, 92-93 
roundoff errors, 22 
using, 45, 512-513 
General determinant, 108 
General Electric CT system, 612 
Generalized Theorem of Pythagoras, 
358-359 

General solution, 13, 239, 326 
Genes, dominant and recessive, 661 
Genetics, 661-670 
autosomal inheritance, 662-665 
autosomal recessive diseases, 665-666 
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inheritance traits, 661-662 
X-linked inheritance, 666-670 
Genetic diseases, 665-666 
Genotypes, 342, 661-662 
defined, 661 

distribution in population, 662-665 
Geometric multiplicity, 309-310 
Geometric vectors, 131 
Geometry: 

of linear systems, 164-170 
quadratic forms in, 420^122 
in R n , 149-150 

Gibbs, Josiah Willard, 146, 173 
Golub, Gene H., 518 
Gram, Jorgen Pederson, 371 
Gram-Schmidt process, 370-373, 375, 397 
Graphic images: 

images of lines under matrix operators, 
280-281 

n -tuples and, 136 
RGB color model, 140 
Graph theory, 559-566 
cliques, 562-564 
directed graphs, 559-564 
dominance-directed graphs, 564—566 
relations among members of sets, 559 
Grassmann, H.G., 184 
Greece, early applications in, 534—536 
Growth matrix, forest management 
model, 588 

H 

Hadamard’s inequality, 129 
Harvesting: 

animal populations, 681-687 
forests, 586-592 

Harvesting matrix (animals), 682-684 

Harvest vector (forests), 588 

Hausdorff, Felix, 625 

Hausdorff dimension, 625-626 

Hearing, least squares model for, 689-694 

Hermite, Charles, 438 

Hermite polynomials, 220 

Hermitian matrices, 437-440 

Hesse, Ludwig Otto, 433 

Hessenberg decomposition, 514 

Hessenberg’s theorem, 415 

Hessian matrices, 433 — 434 

Hilbert, David, 371 

Hilbert space, 371 

Hill, George William, 196 

Hill, Lester S., 651 

Hill 2-cipher, 652, 656 

Hill 3-cipher, 652 

Hill ciphers, 651-652, 656-659 

Hill n -cipher, 652 


Homogeneity property: 
of complex Euclidean inner product, 
316 

of dot product, 147-148 
of linear transformation, 448 
Homogeneous equations, 157-158, 168 
Homogeneous linear equations, 2 
Homogeneous linear systems, 17-19, 239 
constant coefficient first-order, 327 
dimensions of solution space, 223-224 
and elementary matrices, 58 
free variable theorem for, 18-19 
solutions of, 198-199 
Homogeneous systems, solutions spaces 
of, 199 

Hooke’s law, 390 
Houndsfield, G. N„ 612 
Householder matrix, 409 
Householder reflection, 409 
Hue, graphical images, 136 
Human hearing, least squares model for, 
689-694 
Hyperplane, 618 
Hypothesis, A1 

I 

Idempotency, 51 
Identity matrices, 42^43 
Identity operators: 
about, 448 

composition with, 461 
kernel and range of, 452 
matrices of, 476^477 
Images: 

of basis vectors, 450 — 45 1 
of lines under matrix operators, 280-281 
/t-tuples and, 136 
RGB color model, 140 
Image processing, data compression and, 
523-524 

Imaginary axis, A6 
Imaginary numbers, See Complex 
numbers 
Imaginary part: 
of complex numbers, 313, A5 
of vectors and matrices, 314-315 
Inconsistent linear system, 3 
Indefinite quadratic forms, 424 
India, early applications in, 536 
Infinite-dimensional vector space, 214, 

216 

Inheritance, 661-665 
autosomal, 661-665 
X-linked, 661-662, 666-670 
Initial age distribution vector, 672 
Initial condition, 326 


Initial point, 131 
Initial-value problem, 326 
Inner product: 
algebraic properties of, 352 
calculating, 352 
complex inner products, 354 
Euclidean inner product, 145, 316-317, 
346-348 

evaluation inner product, 350-351 

examples of, 346-351 

linear transformation using, 449 

matrix inner products, 348 

on M nn , 349-350 

on real vector space, 345 

on R\ 346-348 

standard inner products, 346, 349-350 
Inner product space, 449 
complex inner product space, 354 
isomorphisms in, 469-470 
unit circle, 348 
unit sphere, 348 
Inputs, in economics, 96 
Input-output analysis, 96 
Input-output matrix, 579 
Instability, 22 
Integer coefficients, 294 
Integral transformation, 452 
Integration, approximate, 93-94 
Interior mesh points, 603 
Intermediate demand vector, 98 
Internet search engines, 704-710 
Interpolating curves, 539 
Interpolating polynomial, 91 
Interpolation, 539 
Intersection, A4 

Invariant under similarity, 303, 484^485 
Inverse: 

of 2 x 2 matrices, 45-46 
of diagonal matrices, 68 
of matrix using its adjoint, 124 
of a product, 46^17 

Inverse linear transformations, 462^163 
Inverse matrices, 43^16 
Inverse operations, 54—57 
Inverse row operations, 54-57 
Inverse transformations, 477-478 
Inversion, solving linear systems by, 
45^16, 61-62 
Inversion algorithm, 55 
Invertibility: 

determinant test for, 121-122 
eigenvalues and, 298 
of elementary matrices, 54 
equivalence theorem, 54-56 
matrix transformation and, 273-274 


16 Index 


test for determinant, 121-122 
of transition matrices, 232-233 
of triangular matrices, 69 
Invertible matrices: 
algebraic properties of, 43-46 
defined, 43 

and linear systems, 61-66 
modulo m, 654-656 
ISBN (books), 153 
Isomorphism, 466^470 
Isotherms, 602 

Iterates (Jacobi iteration), 607-608 
Iterations: 

of Arnold’s cat map, 639 
Jacobi, 607-608 

J 

Jacobi iteration, 607-608 
Jordan, Camille, 515, 518 
Jordan, Wilhelm, 15 
Jordan canonical form, 515 
Junctions (network), 84, 86 

K 

Kaczmarz, S., 615 
Kalman, Dan, 413 
Kernel, 200, 452-A54, 458 
Kirchhoff, Gustav, 88 
Kirchhoff’s current law, 87 
Kirchhoff’s voltage law, 87 
kth principal submatrix, 426 

L 

Lagrange, Joseph Louis, 174 
Laguerre polynomials, 220 
LD [/-decomposition, 498^499 
LD [/-factorization, 499 
Leading 1!, 1 1 
Leading variables, 13, 250 
Least squares: 
curve fitting, 387-388 
mathematical modeling using, 387-392 
Least squares approximation, 395-398 
defined, 396 

in human hearing model, 689-694 
Least squares error, 379 
Least squares error vector, 379 
Least squares fit: 
of polynomial, 390-391 


of quadratic curve to data, 391-392 
straight line fit, 388-390 
Least squares polynomial fit, 390-391 
Least squares solutions, 389-390 
infinitely many, 392 
of linear systems, 378-379, 385 
(^-decomposition and, 385 
straight line fit, 388-390 
unique, 391 

Least squares straight line fit, 388-389 
Left distributive law, 39 
Left eigenvectors, 301 
Legendre polynomials, 372-373 
Length, 142, 346, 357 
Leontief, Wassily, 96, 577 
Leontief economic models, 577-584 
closed model, 577-581 
economic systems, 577 
input-output models, 96-100 
open model, 96-100, 581-584 
Leontief equation, 98 
Leontief matrices, 98 
Leslie matrix age-specific population 
growth, 673, 675-679 
animal population harvesting, 682-684 
eigenvalues, 675-679 
Leslie model, of population growth, 
671-679 

Level curves, 432 
Limit cycle, 616 
Lines: 

image of, 281 

line segment from one point to another 
in R 1 , 168 

orthogonal projection on, 159 
orthogonal projection on lines through 
the origin, 266-267 
point-normal equations, 156-157 
through origin as subspaces, 192-193 
through two points, 526-527 
through two points in R 2 , 167-168 
vector and parametric equations in R 2 
and R\ 164-166 

vector and parametric equations of in 
R\ 166-167 
vector form of, 158, 165 
vectors orthogonal to, 157-158 
Linear algebra, 1 . See also Linear 
equations; Linear systems 
coordinate systems, 212-214 
earliest applications of, 531-536 
Linear beam theory, 539-540 
Linear combinations: 
basis and, 245 
history of term, 196 


of matrices, 32-33 

of vectors, 140, 144-145, 195, 197-198 
Linear dependence, 196 
Linear equations, 2-3, 168. See also 
Linear systems 
Linear form, 417-418 
Linear independence, 196, 202-210, 
226-227 

of polynomials, 206 
of sets, 202-206 

of standard unit vectors in R 2 , 204 
of standard unit vectors in R 4 , 205 
of standard unit vectors in R", 203-204 
of two functions, 206-207 
using the Wronskian, 209-210 
Linearly dependent set, 203 
Linearly independent set, 203, 205 
Linear operators: 
determinants of, 485 
matrices of, 476, 481-482 
orthogonal matrices as, 403-404 
on P 2 , 476-477 
Linear systems, 2-3. See also 
Homogeneous linear systems 
applications, 84-94 

augmented matrices, 6-7, 11, 12, 18, 25, 
34 

for balancing chemical equations, 88-91 
coefficient matrix, 34 
with a common coefficient matrix, 
62-63 

comparison of procedures for solving, 
509-513 

computer solution, 1 
corresponding linear systems, 169 
cost estimate for solving, 509-512 
dot product form of, 168-169 
first-order linear system, 326-328 
general solution, 1 3 
geometry of, 164-170 
with infinitely many solutions, 5-7 
least squares solutions of, 378-379, 385 
network analysis with, 84—88 
nonhomogeneous, 19 
with no solutions, 5 
number of solutions, 61 
overdetermined/underdetermined, 
255-256 

polynomial interpolation, 91-94 
solution methods, 3, 4-7 
solutions, 3, 1 1 

solving by elimination row operations, 

7-8 

solving by Gaussian elimination, 1 1-16, 
21, 22, 512, 513 
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solving by matrix inversion, 45^16, 
61-62 

solving with Cramer’s rule, 126 
in three unknowns, 12-13 
Linear transformations: 
composition of, 460 — 461, 463 — 464 
defined, 447 

dimension theorem for, 454^155 
eigenvalues of, 299 
examples of, 449, 451 
inverse linear transformations, 462^-63 
matrices of, 472^175 
one-to-one, 458^-60 
onto, 458^160 
from P„ to P n+ 1 , 449 
rank and nullity in, 454-455 
using inner product, 449 
Line segment, from one point to another 
in R 2 , 168 
Links, 704 
LINPACK, 492 
Literal coefficients, 45 
Liu Hui, 533 
Logarithmic models, 393 
Lower triangular matrices, 69, 295 
L [/-decompositions, 491^-98, 513 
constructing, 497 
examples of, 494 — 497 
finding, 494 
method, 492 

/-[/-factorization, 491, 494 

M 

M nn , See n x n matrices 
Magnitude (norm), 142 
Main diagonal, 27, 516 
Mandelbrot, Benoit B., 622, 626 
Mantissa, 509 

Markov, Andrei Andreyevich, 336 
Markov chain, 334-340, 549-557 
limiting behavior of state vectors, 
553-557 

steady-state vector of, 339 
transition matrix for, 339-340, 550-553 
Markov matrix, 550 
Mathematical models, 387-388 
MATLAB, 492 

Matrices. See also matrices of specific size, 
e.g.: 2x2 matrices 
adjoint of, 122-124 
algebraic properties of, 39^19 
arithmetic operations with, 27-35 
coefficient matrices, 34, 306, 491 
column matrices, 26-27 
complex matrices, 3 1 5 
compositions of, 477-478 


defined, 1, 6, 26 
determinants, 105-127 
diagonal coefficient matrices, 328 
diagonalization, 302-311 
diagonal matrices, 67-69, 286 
dimension theorem for matrices, 250 
elementary matrices, 52, 54, 58, 
114-115, 284-285 
entries, 26, 27 
equality of, 27-28, 40 
examples of, 26-27 
fundamental spaces, 251-253 
Hermitian matrices, 437-440, 442 
Hessian matrices, 433^134 
identity matrices, 42-43 
of identity operators, 476-477 
inner products generated by, 348-349 
inverse matrices, 43-46 
of inverse transformations, 477^178 
invertibility, 54-56, 69, 121-122, 
232-233 

invertible matrices, 43^-6, 61-66 
inverting, 56-57 

Leontief economic analysis with, 
96-100 

linear combination, 32-33 
of linear operators, 476, 481^182 
of linear transformations, 472-475 
lower triangular matrices, 109-110 
normal matrices, 442 
notation and terminology, 25-27, 34 
orthogonally diagonalizable matrices, 
410 

orthogonal matrices, 401^407 

partitioned, 30-32 

permutation matrices, 499 

positive definite matrices, 426 

powers of, 46^17, 308-309 

with proportional rows or columns, 115 

rank of, 250 

real and imaginary parts of, 314-315 
real matrices, 315, 320-321 
redundancy in, 254 
reflection matrices, 402 
rotation matrices, 262, 402 
row equivalents, 52 
row matrices, 26 
scalar multiples, 28-29 
similar matrices, 303 
singular/nonsingular matrices, 43, 44 
size of, 26, 27, 40 
skew-Hermitian matrices, 442 
skew-symmetric matrices, 442 
square matrices, 27, 35, 43, 67, 69, 
113-117, 307, 401,514-515 


standard matrices, 276, 286-287, 
383-384 

stochastic matrices, 338-339 

submatrices, 31, 427 

symmetric matrices, 70-71, 320, 411, 

433 

trace, 36 

transition matrices, 231-234, 482 
transpose, 34-35 

triangular matrices, 69-70, 294-295, 307 
unitary matrices, 437^138, 440^142 
upper triangular matrices, 69, 294 
zero matrices, 41 
Matrix factorization, 321-322 
Matrix form of coordinate vector, 2 1 7 
Matrix games: 
defined, 569 

two-person zero-sum, 569-573 
Matrix inner products, 348 
Matrix multiplication. See Multiplication 
(matrices) 

Matrix notation, 25-27, 34, 418 
Matrix operators: 
effect of, on unit square, 266 
geometry of invertible, 283-285 
graphics images of lines under matrix 
operators, 280-28 1 
on R 2 , 280-287 
Matrix polynomials, 48 
Matrix spaces, transformations on, 449 
Matrix transformations, 75-81, 448 
composition of, 270-273 
defined, 447 

kernel and range of, 452^153 
in R 2 and R 3 , 259-267 
zero transformations, 448, 452 
Maximization problems, for two-person 
zero-sum games, 573 
Maximum entry scaling, power method 
with, 504-507 
Mean square error, 395 
Mean-value property, 602-603 
Mechanical systems, n-tuples and, 137 
Menger sponge, 636 
Mesh points, 603-607 
Methane, linear systems to analyze 
combustion equation, 88-90 
Minor, 106-107 

Mixed strategies, of players in matrix 
games, 572 

m x n matrices ( M mn ): 
real vector spaces, 186-187 
standard basis for, 215-216 
Modular arithmetic, 638, 652-654 
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Modulus: 

of complex numbers, 313, A7 
defined, 653 

Monte Carlo technique: 
fractal generation, 632-633 
temperature distribution determination, 
608-609 

Morphs, 695, 699-702 
Multiplication (matrices), 29-30. See also 
Product (of matrices) 
associative law for, 39, 40-41 
column-row expansion, 33-34 
by columns and by rows, 31-32 
differentiation by, 468^169 
dot products as, 150-152 
elementary row operations, 53-54 
by invertible matrix, 285 
order and, 41 

Multiplication (vectors). See also Cross 
product; Euclidean inner product; Inner 
product; Product (of vectors) 
in R 2 and R 3 , 133 
by scalars, 184 
Multiplicative inverse: 
of complex number, A7 
of modulo m , 654 

N 

Natural isomorphism, 468 
Natural spline, 544-545 
n -cycle, 642 

n -dimensional vector space, 224 
Negative, of vector, 133 
Negative definite quadratic forms, 424 
Negative pole, 86 

Negative semidefinite quadratic forms, 

424 

Net reproduction rate, 679 
Networks, defined, 84 
Network analysis, with linear systems, 
84-88 

n x n matrices (M„„): 
equivalent statements, 254, 277 
Hessenberg’s theorem, 415 
nondiagonalizability of, 414—415 
standard inner products on, 349-350 
subspaces of, 193 
Nodes (network), 84, 86 
Nonharvest vector (forests), 587 
Nonhomogeneous linear systems, 19 
Nonoverlapping sets, 622, 623 
Nonperiodic pixel points, 645-646 
Nonsingular matrices, 43 
Nontrivial solution, 17 
Nonzero vectors, 200 


Norm (length), 142, 160, 346 
calculating, 143 

complex Euclidean inner product and, 
316-317 

Euclidean norm, 316 
real inner product spaces, 346 
of vector in C[a, b], 351-352 
Normal, 156 
Normal equations, 380 
Normalization, 144 
Normal matrices, 442 
Normal system, 380 
n-space, 135, 136. See also R" 

Nullity, 454 — 455 
of 4 x 6 matrix, 249-250 
sum of, 25 1 
Null space, 237, 240 
Numerical analysis, 1 1 
Numerical coefficients, 45 

O 

Ohms (unit), 86 
Ohm’s law, 86 

1-Step connection, directed graphs, 561, 
564-565 

One-to-one linear transformations, 
458^160, 463^164 

Onto linear transformations, 458^160 
Open economies, Leontief analysis of, 
96-100 

Open Leontief model, 581-584 
Open sectors, 96 

Operators, 449, 460. See also Linear 
operators 
Optimal strategies: 

2x2 matrix games, 575-576 
two-person zero-sum games, 571-573 
Optimal sustainable harvesting policy, 687 
Optimal sustainable yield: 
animal harvesting, 687 
forest harvesting, 586, 589-592 
Optimization, using quadratic forms, 
429^135 
Orbits, 528-529 
Order: 

of differential equation, 326 
matrix multiplication and, 41 
of trigonometric polynomial, 396 
Ordered basis, 217 
Ordered n-tuple, 3, 136 
Ordered pair, 3 
Ordered sets, A4 
Ordered triple, 3 
Order n , 396 

Orthogonal basis, 365, 367-368, 373 
Orthogonal change of variable, 420 


Orthogonal complement, 252-253, 
359-360 

Orthogonal diagonalization, 409 — 416, 441 
Orthogonality: 
defined, 364 
inner product and, 358 
of row vectors and solution vectors, 169 
Orthogonally diagonalizable matrices, 410 
Orthogonal matrices, 401 — 407 
Orthogonal operators, 404 
Orthogonal projections, 158-160, 368-370 
with Algebraic Reconstruction 
Technique, 615-618 
on a column space, 383-384 
geometric interpretation of, 369-370 
kernel and range of, 452^153 
on lines through the origin, 266-267 
on a subspace, 381-382 
Orthogonal projection operators, 260 
Orthogonal sets, 155, 364 
Orthogonal vectors, 155-158, 316 
in M 2 2 , 358 
in P 2 , 358 

Orthonormal basis, 365-367, 370, 

396-397 
change of, 404 

coordinate vectors relative to, 367 
from orthogonal basis, 367-368 
orthonormal sets extended to, 373 
Orthonormality, 364 
Orthonormal sets, 365 
constructing, 364—365 
extended to orthonormal bases, 373 
Outputs, in economics, 96 
Outside demand vector, 97, 98 
Overdetermined linear system, 255-256 
Overlapping sets, 622, 623 

P 

P n , See Polynomials 

Pi- 

linear operators on, 476^177 
orthogonal vectors in, 358 
Theorem of Pythagoras in, 359 
Page ranks, 705 

Parabolic runout spline, 544-547 
Parallel mode scanning, computed 
tomography, 612 
Parallelogram, area of, 176 
Parallelogram equation for vectors, 150 
Parallelogram rule for vector addition, 

132 

Parallel planes, distance between, 162 
Parallel vectors, 133-134 
Parameters, 5, 13, 164 
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Parametric equations, 6 
of lines and planes in R 4 , 166-167 
of lines in R 2 and R 3 , 164-166 
of planes in R } , 164-166 
Particular solution, 239 
Partitioned matrices, 30-32 
Pauli spin matrices, 325 
Payoff, matrix games, 569 
Payoff matrix, 569, 572 
Percentage error, 507 
Period, of a pixel map, 642 
Periodic splines, 548 
Permutation matrices, 499 
Perpendicular vectors, 155 
Photographs, data compression and 
image processing, 523-524 
Piazzi, Giuseppe, 15 
Picture, 640 

Picture-density, of begin-triangle, 696 

Pine forest growth, 591-592 

Pitch (aircraft), 263 

Pivot column, 21-22 

Pivot position, 21-22 

Pixels: 

data compression and image processing, 
523 

defined, 640 
Pixel maps, 640-643 
Pixel points: 
defined, 641 
nonperiodic, 645-646 
Plaintext, 650 
Plaintext vector, 651 
Planes: 

distance between a point and a plane, 
161-162 

distance between parallel planes, 162 
point-normal equations, 156-157 
through origin as subspaces, 193 
through three points, 529 
tiled, 643-644 

vector and parametric equations in R 3 , 
164-166 

vector and parametric equations of in 
R\ 166-167 
vector form of, 158, 165 
vectors orthogonal to, 157-158 
.PL //-decomposition, 499 
_P.Lt/-factorization, 499 
Plus-minus theorem, 223-224 
Points: 

constructing curves and surfaces 
through, 526-530 

distance between a point and a plane, 
161-162 

Point-normal equations, 156-157 


Polar form, of complex numbers, 314, 
A8-A9 

Poles (battery), 86 
Polygraphic system, 65 1 
Polynomials (P„), 48 
characteristic polynomial, 293, 306 
cubic, 539-547 
least squares fit of, 390-391 
Legendre polynomials, 372-373 
linear independence of, 206 
linearly independent set in, 205 
linear transformation, 449 
spanning set for, 197 
standard basis for, 214 
standard inner product on, 350-351 
subspaces of, 194 

trigonometric polynomial, 396-397 
Polynomial interpolation, 91-94 
Population growth, age-specific, 671-679 
Population waves, 676 
Positive definite matrices, 426 
Positive definite quadratic forms, 424^-25 
Positive pole, 86 

Positive semidefinite quadratic forms, 424 
Positivity property: 
of complex Euclidean inner product, 
317 

of dot product, 147-148 
Power, of vertex of dominance-directed 
graph, 566 

Power function models, 393 
Power method, 501-508 
with Euclidean scaling, 503-504 
with maximum entry scaling, 504—507 
stopping procedures, 508 
Powers of a matrix, 46^17, 68, 308-309 
Power sequence generated by A, 501 
Price vector, 579 
Principal argument, A8 
Principal axes, 423 
Principal axes theorem, 420, 423 
Principal submatrices, 427 
Probability, 334 

Probability (Markov) matrix, 550 
Probability transition matrix, 706 
Probability vector, 334, 551 
Product (of matrices), 28-30 
determinants of, 120-121 
inverse of, 46-47 
as linear combination, 32-33 
of lower triangular matrices, 69 
of symmetric matrices, 71 
transpose of, 49 
Product (of vectors): 
cross product, 172-179 
scalar multiple in R 2 and R 3 , 133 


Products (in chemical equation), 89 
Production vector, 97, 98, 581 
Productive consumption matrix, 583-584 
Productive open economies, 98-100 
Profitable industries, in Leontief model, 
584 

Profitable sectors, 99-100 
Projection operators, 260-261, 275-276 
Projection theorem, 158-159, 368 
Proofs, A1-A4 

Pure imaginary complex numbers, A5 
Pure strategies, of players in matrix 
games, 572 

Q 

2-R-decomposition, 374, 385 
Quadratic curve, of least squares fit, 
391-392 

Quadratic forms, 417^422 
applications of, 4 1 9 — 420 
change of variable, 419 
conic sections, 420^122 
expressing in matrix notation, 418 
indefinite quadratic forms, 424 
negative definite quadratic forms, 424 
negative semidefinite quadratic forms, 
424 

optimization using, 429^435 
positive definite quadratic forms, 
424^125 

positive semidefinite quadratic forms, 
424 

principal axes theorem, 420 
Quadratic form associated with A, 418 
Quotient, A7 

R 

R n : 

coordinates relative to standard basis 
for, 218 

distance in, 144-145 
Euclidean inner product, 346-348 
geometry in, 149-150 
linear independence of standard unit 
vectors in, 203-204 
norm of a vector, 142-143 
span in standard unit vector, 196 
spanning in, 196 
standard basis for, 214 
standard unit vectors in, 144 
Theorem of Pythagoras in, 160 
transition matrices for, 233-234 
two-point vector equations in, 167-168 
vector forms of lines and planes in, 166 
vectors in, 135-139 
as vector space, 185 
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R 2 : 

Anosov automorphism, 648-649 
dot product of vectors in, 145 
line segment from one point to another 
in, 168 

lines through origin are subspaces of, 
192-193 

lines through two points in, 167-168 
matrix operators on, 280-287 
matrix transformations in, 259-262, 
264-267 

norm of a vector, 142-143 
parametric equations, of lines in, 
164-166 

self-similar sets in, 622-623 
shears in, 265-266 
spanning in, 196-197 
unit circles in, 348 
vector addition in, 132, 134 
vectors in, 131-140 
R } : 

coordinates in, 218-219 
dot product of vectors in, 145 
linear independence of standard unit 
vectors in, 204 

lines through origin are subspaces of, 
192-193 

matrix transformations in, 259-265 
norm of a vector, 142-143 
orthogonal set in, 364 
rotations in, 262-263 
spanning in, 196-197 
standard basis for, 2 1 5 
vector addition in, 132, 134 
vector and parametric equations of lines 
in, 164-166 

vector and parametric equations of 
planes in, 164-166 
vectors in, 131-140 
R 4 : 

cosine of angle between two vectors in, 
357 

linear independence of standard unit 
vectors in, 205 

Theorem of Pythagoras in, 160 
vector and parametric equations of lines 
and planes in, 166-167 
Random iteration algorithm, 632 
Range, 452^454 
Rank, 454^455 
of 4 x 6 matrix, 249-250 
of an approximation, 523 
dimension theorem for matrices, 250 
maximum value for, 250 
redundancy in a matrix and, 254 
sum of, 25 1 


Rate of convergence, 507 
Rayleigh, John William Strutt, 506 
Rayleigh quotient, 505 
Reactants (in chemical equation), 89 
Real axis, A6 

Real inner product space, 345, 355-356 
Real line, 135 

Real matrices, 315, 320-321 
Real part: 

of complex numbers, 313, A5 
of vectors and matrices, 314-315 
Real- valued functions, vector space of, 
187 

Real vector space, 183, 184, 345 
Recessive genes, 661 
Reciprocals: 
of complex number, A7 
of modulo m, 654 

Rectangular coordinate systems, 212-213 
Reduced row echelon forms, 1 1-12, 21, 
318 

Reduced singular value decomposition, 
521 

Reduced singular value expansion, 522 
Redundancy, in matrices, 254 
Reflections, composition of, 272, 284-285 
Reflection matrices, 402 
Reflection operators, 259-260, 267 
Regression line, 389 
Regular Markov chain, 338, 554 
Regular stochastic matrices, 338-339 
Regular transition matrix, 554 
Relative error, 507 
Relative maximum, 433, 434 
Relative minimum, 432, 434 
Repeated mappings, of Arnold’s cat map, 
639-640 

Replacement matrix, forest management 
model, 588 
Residuals, 389 

Residue, of a modulo m, 653-654 
Resistance (electrical), 86 
Resistor, 86 
Resultant, 154 

Revection transformation, computer 
graphics, 599 
RGB color cube, 140 
RGB color model, 140 
RGB space, 140 
Rhind Papyrus, 532 
Right circular cylinder, 437 
Right distributive law, 39 
Right eigenvectors, 301 
Right-hand rule, 176, 262 
Roll (aircraft), 263 


Rotations: 

composition of, 271-272, 283 
kernel and range of, 453 
in R\ 262-263 
Rotation equations, 262, 405 
Rotation matrices, 262, 402 
Rotation of axes: 
in 2-space, 404-406 
in 3-space, 406^107 
Rotation operator, 261-263 
properties of, 275 
on R\ 262-263 
Rotation transformation: 
computer graphics, 596-598 
self-similar sets, 626 
Roundoff errors, 22 

Rows, cofactor expansion and choice of 
row, 109 

Row-column method, 31-32 
Row echelon form, 11-12, 14-15,21-22, 
241 

Row equivalents, 52 
Row matrices, 26 
Row-matrix form of vectors, 237 
Row operations, See Elementary row 
operations 
Row reduction: 
basis by, 242-244 

evaluating determinants by, 113-117 
Row space, 237, 240, 241, 251-252 
basis by row reduction, 242-243 
basis for, 241, 244-245 
equal dimensions of row and column 
space, 248-249 

Row vectors, 26, 27, 40, 168-169, 237 
Row-vector form of vectors, 1 39 
Row-wheel, 568 
Runout splines, 544-547 

S 

Saddle points, 433, 434, 572 
Sample points, 350 
Saturation, graphical images, 136 
Scalars, 26, 131, 133 
from vector multiples, 172 
vector space scalars, 184 
Scalar moment, 180 
Scalar multiples, 28-29, 184 
Scalar multiplication, 133, 184 
Scalar triple product, 177 
Scaling: 

Euclidean scaling, 503-504 
maximum entry scaling, 504-507 
Scaling transformation: 
computer graphics, 595 
self-similar sets, 622, 626-627 
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Schmidt, Erhardt, 371, 518 
Schur, Issai, 414, 415 
Schur decomposition, 415, 514 
Schur ’s theorem, 415 
Schwarz, Hermann Amandus, 149 
Search engines, Internet, 704-710 
Second derivative test, 433, 434 
Sectors (economic), 96 
Self-similar sets, 622-626 
Sensitivity to initial conditions, dynamical 
systems, 647 
Sets, A3-A4 

linear independence of, 202-206 
relations among members of, 559 
self-similar sets, 622-626 
Set-builder notation, A3-A4 
Shear operators, 265-266, 284—285 
Shear transformation, computer graphics, 
599 

Sheep harvesting, 684-685 
Shifting operators, 460 
Sierpinski, Waclaw, 624 
Sierpinski carpet, 624, 626, 628-631, 633, 
636 

Sierpinski triangle, 624, 626, 628-629, 
631-632 

Similarity invariants, 303, 484^185 
Similarity transformations, 302 
Similar matrices, 303 
Similitudes, 626-629 
Singular matrices, 43, 44 
Singular values, 515-516 
Singular value decomposition (SVD), 
516-519, 521-524 
Skew-Hermitian matrices, 442 
Skew product, 173 
Skew-symmetric matrices, 442 
Solutions: 

best approximations, 379-380 
comparison of procedures for solving 
linear systems, 509-513 
cost of, 509-512 
factoring, 491 
flops and, 509-512 

Gaussian elimination, 1 1-16, 22, 512, 
513 

Gauss-Jordan elimination, 15, 18,21, 

22, 45 — 46, 92-93, 318, 512-513 
general solution, 13, 239, 326 
of homogeneous linear systems, 

198-199 

least squares solutions, 378-379, 385 
of linear systems, 3, 11 
of linear systems by diagonalization, 
328-330 

of linear systems by factoring, 491 


of linear systems with initial conditions, 
327-328 

particular solution, 239 
power method, 501-508 
trivial/nontrivial solutions, 17, 327 
Solutions spaces, of homogeneous 
systems, 199 

Solution vectors, 168-169 
Sound waves, in human ear, 689-694 
Spacecraft, yaw, pitch, and roll, 263 
Spanning: 

in R 2 and R 3 , 196-197 
in R n , 196 
testing for, 198 
Spanning sets, 197, 200, 216 
Spans, 196, 222 

Spectral decomposition of A, 413^114 
Sphere, through four points, 529-530 
Spline interpolation, cubic, 538-547 
Spring constant, 390 
Square matrices, 43, 67, 69, 401 
decompositions of, 514-515 
determinants of, 113-117 
eigenvalues of, 307 
of order n, 27 
trace, 36 
transpose, 35 
Standard basis: 

coordinates relative to standard basis 
for R'\ 218 

coordinate vectors relative to, 218 

for M mn , 215-216 

for polynomials, 214 

for fl 3 , 215 

for R", 214 

Standard inner product: 
defined, 346 
on polynomials, 350 
on vector space, 349-350 
Standard matrices: 
for matrix transformation, 286-287 
for T^ 1 , 276 

Standard unit vectors, 144, 175-176 
linear independence in R 3 , 204 
linear independence in R 4 , 205 
linear independence in R' 1 , 203-204 
in span R", 196 
State of a particle system, 137 
State of the variable, 332 
State vector, 334 
of Markov chains, 551, 553-557 
webgraph, 706 
Static equilibrium, 155 
Steady-state vector, of Markov chain, 339, 
555-556 

Stochastic matrices, 338-339, 550 


Stochastic processes, 334 
Stopping procedures, 508 
Strategies, of players in matrix games, 
570-573 

Strictly determined games, 572 
String theory, 135, 136 
Subdiagonal, 415 
Submatrices, 31, 427 
Subsets, A4 

Subspaces, 191-200,453 
creating, 195-198 
defined, 191 
examples of, 192-200 
of M nn , 193 

orthogonal projections on, 381-382 
of polynomials, 194 
of polynomials ( P n ), 194 
of R 2 and R 3 , 192-193 
zero subspace, 192 
Substitution ciphers, 650 
Subtraction: 

of vectors in R 2 and R 3 , 133 
of vectors in R n , 138 
Sum: 

direct, 290 
matrices, 28, 47 
of rank and nullity, 25 1 
of vectors in R 2 and R 3 , 132, 134 
of vectors in R n , 138 
SVD (singular value decomposition), 
516-519, 521-524 
Sylvester, James, 35, 107, 518 
Sylvester’s inequality, 259 
Symmetric matrices, 70-71, 320 
eigenvalues of, 41 1 
Hessian matrices, 433 — 434 
Symmetry property, of dot product, 
147-148, 316 

T 

T~\ standard matrix for, 276 
Taussky-Todd, Olga, 319 
Technology Matrix, 97 
Television, market share as dynamical 
system, 332-334 

Temperature distribution, at equilibrium. 
See Equilibrium temperature 
Terminal point, 131 
Theorem of Pythagoras: 
generalized Theorem of Pythagoras, 
358-359 
in R 4 , 160 
in R" , 160 
3x3 matrices: 
adjoint, 123 
determinants, 110 
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eigenvalues, 293-294 
orthogonal matrix, 401 — 402 
^-decomposition of, 375 
Three-dimensional object visualization, 
593-595 
3-space, 131 
cross product, 172-179 
scalar triple product, 177 
3-Step connection, directed graphs, 561 
Three-Step Procedure, 474-475 
3-tuples, 135 
Tien-Yien Li, 637 
Tiled planes, 643-644 
Time, as fourth dimension, 135 
Time-varying morphs, 699-702 
Time-varying warps, 698-699 
Topological dimensions, 624-625 
Topology, 624-625 
Torque, 180 
Tournaments, 564 
Trace, square matrices, 36 
Traffic flow, network analysis with linear 
systems, 85-86 

Transformations. See also Linear 
transformations; Matrix 
transformations 

differentiation transformation, 453 
evaluation transformation, 450 
integral transformation, 452 
inverse transformations, 477 — 478 
on matrix spaces, 449 
one-to-one linear transformation, 459 
Transition matrices, 231-234, 482 
invertibility of, 232-233 
Markov chains, 550-553 
for R n , 233-234 

Transition probability, Markov chains, 
549 

Translation, 132, 450 
Translation transformation, computer 
graphics, 596 
Transpose, 34-35 
determinant of, 113 
invertibility, 49 
of lower triangular matrix, 69 
properties, 48^19 
vector spaces, 251-252 
Triangle: 
area of, 176-177 

Sierpinski, 624, 626, 628-629, 631-632 
Triangle inequalities: 
for distances, 149-150, 357 
for vectors, 149-150, 357 


Triangle rule for vector addition, 132 
Triangular matrices, 69-70 
diagonalizability of, 307 
eigenvalues of, 294-295 
Triangulation, 697-698 
Trigonometric polynomial, 396 
Trivial solution, 17, 327 
Turing, Alan Mathison, 493 
2x2 matrices: 

cofactor expansions of, 107-108 
determinants, 110 
eigenvalues of, 319-321 
games, 573-576 
inverse of, 45^-6 
vector space, 186 
2x2 vector, eigenvectors, 292 
Two-person zero-sum games, 569-573 
Two-point vector equations, in R n , 
167-168 

2-Step connection, directed graphs, 561, 
564-565 
2-space, 131 
2-tuples, 135 

U 

Underdetermined linear system, 255 
Unified field theory, 136 
Union, A4 

Unitary diagonalization, of Hermitian 
matrices, 441^142 

Unitary matrices, 437^-38, 440-442 

Unit circle, 348 

Units of measurement, 2 1 3 

Unit sphere, 348 

Unit vectors, 143-145, 316, 346 

Unknowns, 2 

Unstable algorithms, 22 

Upper Hessenberg decomposition, 415 

Upper Hessenberg form, 415 

Upper triangular matrices, 69, 110, 294 

V 

Vaccine distribution, 575-576 
Vectors, 131 

angle between, 146-149, 356-357 
arithmetic operations, 132-134, 137-138 
“basis vectors,” 214 
collinear vectors, 133-134 
column-matrix form of, 237 
column-vector form of, 140 
comma-delimited form of, 139, 217, 237 
components of, 134-135 


in coordinate systems, 134-135 
coordinate vectors, 218-219 
dot product, 145-148, 150-152 
equality of, 132, 137-138 
equivalence of, 132, 137-138 
geometric vectors, 1 3 1 
linear combinations of, 140, 144-145, 

195, 197-198 

linear independence of, 196, 202-210 
nonzero vectors, 200 
normalizing, 144 
norm of, 160 

notation for, 131, 139-140 

orthogonal vectors, 155-158, 316 

parallelogram equation for, 150 

parallel vectors, 133-134 

perpendicular vectors, 155 

probability vector, 334 

in R 2 and R 3 , 131-140 

real and imaginary parts of, 314-315 

in R", 135-139 

row-matrix form of, 237 

row vectors, 26, 27, 40, 168-169, 237 

row- vector form of, 139 

solution vectors, 168-169 

standard unit vectors, 144, 175-176, 

196, 203-204 
state vector, 334 

triangle inequality for, 149-150 
unit vectors, 143-145, 316, 346 
zero vector, 132, 137 
Vector addition: 
matrix games, 572 
parallelogram rule for, 1 32 
in R 2 and R 3 , 132, 134 
triangle rule for, 132 
Vector equations: 
of lines and planes in R 4 , 166-167 
of lines in R 2 and R 3 , 164-166 
of planes in R 3 , 164—166 
two-point vector equations in R'\ 
167-168 

Vector forms, 165 
Vector space, 183 
axioms, 183-184 

complex vector spaces, 184, 313-324 
dimensions of, 222 
examples of, 185-189, 216 
finite-dimensional vector spaces, 214, 
216-217, 224-225 

infinite-dimensional vector spaces, 214, 
216 

of infinite real number sequences, 185 
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isomorphic, 466 
ofra x n matrices, 186-187 
n -dimensional, 224 
of real- valued functions, 187 
real vector space, 183, 184 
subspaces, 191-200, 453 
for transposes of matrices, 251-252 
of 2 x 2 matrices, 186 
zero vector space, 185, 222 
Vector space scalars, 184 
Vector subtraction, in R 2 and R 3 , 133 
Venn Diagrams, A4 
Vertex matrix, 559-561 
Vertex points, 697-698 
Vertices, graphs, 559-560 
Viewing audience maximization, 573 
Visualization, of three-dimensional 
objects, 593-595 


Volts (units), 86 
Voltage rises/drops, 86, 87 
von Neumann, John, 642 

W 

Warps, 695-699 

affine transformations with, 696 
defined, 696 
time-varying, 698-699 
Webgraph, 704 
Weight, 346 

Weighted Euclidean inner products, 
346-349 

Weyl, Herman Klaus, 518 
Wildlife migration, as Markov chain, 
336-337 

Wilson, Edwin, 173 
Work, 163 


Wronski, Jozef Hoene de, 208 
Wronskian, 209-210 

X 

X-linked inheritance, 661-662, 666-670 
X-ray computed tomography, 61 1-620 

Y 

Yaw, 263 

Yorke, James, 637 

Z 

Zero matrices, 41 
Zero population growth, 679 
Zero subspace, 192 
Zero-sum matrix games, two-person, 
569-573 

Zero transformations, 448, 452 
Zero vectors, 132, 137 
Zero vector space, 185, 222 



A P P L I C AT I O N S A N D H I S T O R I C A L T O P I C S 


Aeronautical Engineering 

Lifting force 95 
Solar powered aircraft 395 
Supersonic aircraft flutter 32 7 
Yaw, pitch, and roll 264 

Astrophysics 

Kepler's laws 10.1* 

Measurement of temperature on Venus 394 

Biology and Ecology 

Air quality prediction 343 
Forest management 10.8* 

Genetics 344, 10. 15* 

Harvesting of animal populations 10.17* 
Population dynamics 343,10.16* 

Wildlife migration 338-339 

Business and Economics 

Game theory 10.6* 

Leontief input-output models 96-100, 10.7* 
Market share 334-336, 343 
Sales and cost analysis 38, 39 
Sales projections using least squares 395 

Calculus 

Approximate integration 93-94 
Derivatives of matrices 102 
Integral inner products 353 
Partial fractions 25 

Chemistry 

Balancing chemical equations 88-91 

Civil Engineering 

Equilibrium of rigid bodies Module 5** 
Traffic flow 85-86 

Computer Science 

Color models for digital displays 67, 136, 140 
Computer graphics 10.9* 

Facial recognition 297 
Fractals 10.12* 

Google site ranking 10.20* 

Warps and morphs 10.19* 

Cryptography 

Hill ciphers 10.14* 


Differential Equations 

First-order linear systems 328-332 

Electrical Engineering 

Circuit analysis 84-85, 86-88 
Digitizing signals 185 
LRC circuits 333 

Geometry in Euclidean Space 

Angle between a diagonal of a cube and an edge 147 

Direction angles and cosines 154 

Parallelogram law 150 

Generalized theorem of Pythagoras 160, 360 

Reflection about a line 268 

Rotation about a line 411 

Rotation of coordinate axes 406-409 

Vector methods in plane geometry Module 4** 

Library Science 

ISBN numbers 153 

Linear Algebra Historical Figures 

Harry Bateman 519 
Eugene Beltrami 520 
Maxime Bocher 7 
Viktor Bunyakovsky 149 
Lewis Carroll 108 
Augustin Cauchy 122 
Arthur Cayley 35, 44 
Gabriel Cramer 125 
Leonard Dickson 123 
Albert Einstein 136 
Gotthold Eisenstein 30 
Leonhard Euler A10 
Leonardo Fibonacci 52 
Jean Fourier 400 
Carl Friedrich Gauss 15, 106 
Josiah Gibbs 146, 173 
Gene Golub 520 
Jorgen Pederson Gram 373 
Hermann Grassman 18 
Jacques Hadamard 129 
Charles Hermite 440 
Ludwig Hesse 435 
Karl Hessenberg 417 
George Hill 196 
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Alton Householder 477 
Camille Jordan 520 
Wilhelm Jordan 75 
Gustav Kirchhoff 88 
Joseph Lagrange 774 
Wassily Leontief 96 
Andrei Markov 338 
Abraham de Moivre A10 
John Rayleigh 508 
Erhardt Schmidt 373 
Issai Schur 477 
Hermann Schwarz 749 
James Sylvester 35, W7 
OlgaTodd 321 
AlanTuring 495 
John Venn A4 
Herman Weyl 520 
JosefWronski 206 

Mathematical History 

Early history of linear algebra 10.2* 

Mathematical Modeling 

Chaos 10.13* 

Cubic splines 10.3* 

Curve fitting 10, 24, 91-93, 10. 1 * 

Exponential models 395 
Graph theory 10.5* 

Least squares 380-385, 387, 392-394, 399-400 
Linear, quadratic, cubic models 389-390 
Logarithmic models 395 
Markov chains 337, 10.4* 

Modeling experimental data 389-380, 393-394 
Population growth 10.16* 

Power function models 395 

Mathematics 

Cauchy-Schwarz inequality 148-149 
Constrained extrema 431-434 
Fibonacci sequences 52 
Fourier series 398-400 
Hermite polynomials 221 
Laguerre polynomials 221 
Legendre polynomials 374-375 
Quadratic forms 419-429 
Sylvester's inequality 259 


Medicine and Health 

Computed tomography 10.11* 

Modeling human hearing 10.18* 

Nutrition 70 

Numerical Linear Algebra 

Cost in flops of algorithms 511-515 
Data compression 523-526 
FBI fingerprint storage 925 
Fitting curves to data 10,24,91-93 
Householder reflections 477 
/.(./-decomposition 493-501 
Polynomial interpolation 92-94 
Power method 503-510 
Powers of a matrix 310-311 
QR-decomposition 376-377, 387 
Roundoff error, instability 22 
Schur decomposition 477 

Singular value decomposition 516-522, 523-524 
Spectral decomposition 415-416 
Upper Hessenberg decomposition 477 

Operations Research 

Assignment of resources Module 6** 

Linear programming Modules 1-3** 

Storage and warehousing 136 

Physics 

Displacement and work 163 
Experimental data 136 
Mass-spring systems 201-202 
Mechanical systems 137 

Motion of falling body using least squares 393-394 

Quantum mechanics 327 

Resultant of forces 754 

Scalar moment of force 180 

Spring constant using least squares 392 

Static equilibrium 755 

Temperature distribution 502 

Torque 180 

Probability and Statistics 

Arithmetic average 349 
Sample mean and variance 430 

Psychology 

Behavior 343 


